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Abstract 


A  deductive  database  consists  of  a  set  of  stored  facts,  and  a  set  of  logical  rules  (typically, 
recursive  Horn  clauses)  that  are  used  to  manipulate  these  facts.  A  number  of  optimizations 
in  such  databases  involve  the  transformation  of  sets  of  logical  rules  [programs)  to  simpler, 
more  efficiently  evaluable  programs.  We  consider  a  class  of  optimizations  in  which  the 
transformation  is  a  simple  syntactic  restriction  on  the  form  of  the  original  program,  and  in 
which  the  correctness  of  the  transformation  indicates  the  existence  of  a  normal  form  for  the 
proof  trees  generated  by  the  program.  For  example,  the  existence  of  basis-linearizabiliUj  in 
a  nonlinear  program  indicates  that  the  program  is  inherently  linear,  and  permits  the  use 
of  special-purpose  query  evaluators  for  linear  recursions.  The  canonical  example  of  a  basis- 
linearizable  program  is  the  program  that  computes  the  transitive  closure  of  a  binary  relation; 
the  corresponding  normal  form  for  the  proof  trees  is  that  of  right-linearity.  Similarly,  if  a 
program  is  sequencable^  then  it  is  conducive  to  a  pipelined  evaluation.  In  addition,  the 
existence  of  k- boundedness  in  a  program  permits  the  elimination  of  recursion  overhead 
in  evaluating  the  program.  We  investigate  the  complexity  of  detecting  such  optimization 
opportunities,  and  provide  correct  (but  not  always  complete)  algorithms  for  this  purpose. 

Each  of  the  problems  that  are  mentioned  above  may  be  described  in  terms  of  the 
subtree-elimination  problem,  which  we  define  and  analyze.  We  relate  the  detection  of  basis- 
liiiearizability,  sequencability  and  1-boundedness  to  the  complexity  classes  WC,  V  and  A"'P, 
and  show  that  the  first  two  of  these  problems  are,  in  general,  undecidable.  The  techniques 
used  in  our  analysis  provide  a  complete  description  of  the  complexity  of  deciding  the  equiv¬ 
alence  of  conjunctive  queries  (single- rule,  non  recursive  programs),  and  tight  undecidability 
results  for  the  detection  of  program  equivalence. 
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Chapter  1 


Subtree  eliminations 


1.1  Introduction 

A  deductive  database  system  represents  the  use  of  predicate  logic  as  a  programming  language 
for  database  systems.  One  may  think  of  this  programming  language  as  the  extension  of 
relational  algebra  ([11])  through  the  use  of  recursion.  This  extension  provides  a  strict 
increase  in  the  expressive  power  of  the  database  query  language  ([2]),  but  makes  query 
evaluation  potentially  more  expensive;  that  is,  a  general-purpose  query  evaluator  is  likely 
to  be  inefficient  when  applied  to  a  “simple”  program.  Many  of  the  optimization  strategies 
that  have  been  incorporated  into  the  experimental  deductive  database  systems  curreiitly 
under  construction  ([23,  22,  21],  for  example)  are  based  on  the  recognition  of  programs  on 
which  limited  yet  efficient  query  evaluators  may  be  used.  In  this  dissertation,  we  provide  an 
alternative  optimization  strategy:  the  replacement  of  programs  by  semantically  equivalent 
but  syntactically  simpler  programs,  such  that  efficient  algorithms  may  be  used  with  respect 
to  the  transformed  programs.  The  optimizations  that  we  investigate  are  based  on  the 
detection  of  “normal  forms”  for  the  proof  trees  generated  by  the  program  in  question.  The 
problems  that  we  address  are  decision  problems;  that  is,  given  a  program  and  a  normal  form, 
we  ask  whether  the  normal  form  applies  to  the  given  program.  In  this  thesis,  we  present  a 
uniform  framework  for  the  description  of  normal  forms,  a  mechanism  for  the  construction 
of  conditions  that  are  sufficient  (but  not  always  necessary)  for  the  detection  of  each  such 
normal  form,  and  complexity  results  for  the  detection  of  three  common  normal  forms.  Oui 
results  have  implications  to  the  complexity  of  deciding  equivalence  among  recursive  and 
nonrecursive  programs. 

1.2  Deductive  databases 

For  our  purposes,  a  deductive  database  system'  consists  of  a  finite  set  of  stored  gioiiml 
facts  (the  extensional  database  or  EDB),  and  a  finite  set  of  rules  (Horn  clauses)  that  ai<' 
used  to  manipulate  the  EDB.  A  set  of  rules  is  ternied  a  program.  The  program  comprisc'N 

^See  [35,  36]  for  a  comprehensive  treatment. 
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the  intensional  database  or  IDB.  Relations  are  defined  in  terms  of  predicates;  that  is,  for 
any  predicate  p,  the  relation  for  p  is  the  set  of  tuples  d  such  that  p(d)  is  true.  In  this 
report,  we  will  use  the  terms  “predicate”  and  “relation”  interchangeably.  A  predicate  that 
corresponds  to  an  EDB  relation  is  termed  an  extensional  or  EDB  predicate,  and  a  predicate 
that  is  defined  by  a  rule  is  termed  intensional  or  IDB.  We  assume  without  loss  of  generality 
that  no  predicate  is  both  intensional  and  extensional. 

1.2.1  Syntax 

Programs  will  be  written  using  Prolog  syntax.  A  rule  is  of  the  form 

p(A')  6i(fi),...,6n(y„). 

The  “  ”  represents  the  “if”  operator,  and  a  comma  represents  the  “and”  operator.  The 

atomic  formula  {atom)  p{X)  is  termed  the  head  of  the  rule,  and  the  conjunction  on  the 
right  of  the  “if”  symbol  is  termed  the  body  of  the  rule.  Each  atomic  formula  in  the  body 
is  termed  a  subgoal.  The  rule  defines  the  predicate  p,  and  p  is  hence  intensional.  The 
variables  appearing  in  the  head  of  the  rule  are  termed  distinguished,  and  all  other  variables 
are  termed  nondistinguished.  Distinguished  variables  are  universally  quantified  over  the 
rule,  and  nondistinguished  variables  are  implicitly  existentially  quantified  in  the  body  of 
the  rule.  That  is,  ii  Wi,...,  Wk  are  the  distinguished  variables  in  the  rule  and  Zi , . . . ,  Z,„ 
are  the  nondistinguished  variables,  then  the  rule  represents  the  formula 

VW'i, . . . ,  VPfc((3Zi, . . .,  Z„6i(fi)  A  ...  A  bn{Yn))  D  p{X)) 

Example  1.1  The  following  program  V  consists  of  the  two  rules  ri  and  r2,  and  defines  the 
intensional  predicate  p.  We  assume  that  6  is  an  extensional  predicate. 

r,:piX,Y):-p{X,U),p{U,Y). 
r2:p{X,Y):-  b{X,Y). 

p{X,Y)  is  the  head  of  each  rule.  The  conjunction  p{X,U),p{U,Y)  is  the  body  of  rule  rj. 
and  b{X,Y)  is  the  body  of  rule  r2.  The  meaning  of  rule  yq  is;  for  all  A'  and  Y,  p{X,Y)  is 
true  if  for  some  U,  p{X,U)  and  p{U,Y)  are  both  true;  that  is.  rq  represents  the  formula 

VA,  Y{{3Up{X,  U)  A  p{U,  V))  D  p{X,  Y)] 


□ 


If  a  predicate  p  appears  in  the  head  of  a  rule  and  q  appears  in  the  body  of  the  rule, 
then  p  is  said  to  depend  on  q.  A  predicate  p  is  termed  recursive  if  p  depends  transitively 
upon  itself,  and  a  rule  is  termed  recursive  if  a  predicate  q  appearing  in  the  body  of  the 
rule  depends  transitively  upon  the  predicate  p  appearing  in  the  head  of  the  rule.  .411  other 
predicates  and  rules  are  termed  nonrecursive.  A  program  is  termed  recursive  if  any  rule  is 
recursive,  and  nonrecursive  otherwise.  In  the  example  above,  tlie  predicate  p  and  rule  ;q  are 
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recursive  (so  the  program  is  recursive),  and  the  predicate  6  and  the  rule  7*2  are  noiirecursivc. 
A  nonrecursive  rule  is  also  termed  a  basis  ov  initialisation  vnle.  A  rule  is  said  to  be  linear  U 
at  most  one  subgoal  in  the  rule  is  intensional,  linear  recursive  if  exactly  one  subgoal  in  the 
body  is  recursive  with  the  head  of  the  rule,  and  bilinear  If  exactly  two  subgoals  are  recursive 
with  the  head.  Rule  vi  in  Example  1,1  is  bilinear,  and  rule  7-2  is  linear.  Rule  7‘i  in  Example 

1.2  below  is  also  linear.  A  program  is  termed  hnear  if  every  rule  in  the  program  is  linear, 
and  linear  recursive  if  every  rule  is  linear  recursive  or  nonrecursive. 

1.2.2  Semantics 

The  accepted  semantics  for  Horn-clause  programs  consists  of  the  unique  minimum  Herbrand 
model  or  least  fixed  point  ([38]).  The  idea  is  that  we  may  think  of  the  ‘‘application’'  of  a  rule 
as  the  bottom-up  (forward-chaining)  use  of  the  Horn  clause  represented  by  the  rule.  Then, 
the  relation  for  each  intensional  predicate  is  the  smallest  relation  that  satisfies  each  of  the 
rules  in  the  program;  that  is,  the  smallest  relation  that  is  closed  under  the  application  of 
the  rules  in  the  program  as  described  above.  Alternatively,  we  may  generate  the  intensional 
relations  by  initialising  each  intensional  predicate  to  be  empty,  adding  all  facts  generated 
by  basis  rules  and  then  applying  the  rules  in  a  bottom-up  (forward-chaining)  manner  until 
no  new  facts  are  generated.  The  first  of  these  views,  that  of  the  relation  for  a  predicate 
being  the  smallest  relation  satisfying  the  initialisation  rules  and  closed  under  the  recursive 
rules,  is  integral  to  the  approach  taken  by  this  report.  Since  programs  may  be  viewed  as 
generalised  closures,  we  have  focussed  our  attention  on  optimizations  that  may  be  performed 
on  programs  that  compute  such  common  relations  as  the  symmetric  and  transitive  closures 
of  a  binary  relation. 

Example  1.2  The  program 

7-1  :  p(A\r)  p{\\X). 
r2  :  p{X,Y)  b{X,Y). 

computes  the  symmetric  closure  of  the  basis  predicate  6,  The  recursive  rule  rj  states  that  p 
is  symmetric,  the  basis  rule  insists  that  b  C  p  and  minimality  is  imposed  by  the  semantics 
of  the  program.  Note  that  the  program  is  linear  (and  linear  recursive).  □ 

Example  1.3  We  repeat  here  the  program  of  Example  1.1. 

r,  :p(Xr)  p{X,U)MU,Y}. 
r.2:p{X,Y):-b{XA'). 

The  first  rule  says  that  p  is  transitive  and  the  second  requires  inclusion;  thus,  the  program 
computes  the  transitive  closure  of  b.  If  we  think  of  b  as  the  “parent”  relation,  then  this 
program  computes  the  “ancestor”  relation.  This  program  is  important  in  that  there  is  no 
nonrecursive  program  computing  the  transitive  closure  of  b  ([2]).  justifying  our  earlier  claim 
that  the  addition  of  recursion  to  relational  algel)ra  increases  the  expressive  power  of  the 
language.  □ 
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p(joe,  anil) 


p{joe,  bob) 


p{bob,  ann) 


b{joe ,  bob) 


b{bob,  ann) 


Figure  1.1:  Proof  tree. 


Example  1.4  The  program 

■/‘i  :  p(X\Y)  p{X,U),p(U,Y). 
r2:p{X.Y)-.-  p{Y,X). 
r^-.piX.Y)  biX,Y). 

computes  the  symmetric,  transitive  closure  of  b.  □ 

A  rule  is  said  to  be  safe  if  for  every  rule,  every  distinguished  variable  appears  in  the 
rule  body.  A  program  is  said  to  be  Datalog  if  it  is  safe  and  function-free;  the  programs  of 
Examples  1.1  -  1.4  are  all  Datalog  programs.  In  this  report,  we  will  for  the  most  part  restrict 
our  attention  to  Datalog  programs.  Such  programs  are  commonly  used  because  they  are 
pow'erful  enough  for  the  description  of  many  real-life  problems,  but  always  generate  finite 
relations  from  a  finite  database  (because  the  Herbrand  universe  is  finite). 

1.2.3  Proof  trees 

For  the  purposes  of  this  report,  it  will  be  convenient  to  view  the  relation  generated  by  a. 
program  in  another  way.  We  say  that  an  atom  is  ground  if  no  variable  appears  in  it.  A  rule 
is  termed  instantiated  if  every  atom  in  the  rule  is  replaced  with  a  ground  atom  with  which 
it  unifies,  such  that  the  implied  unifications  are  consistent.  A  proof  tree  is  a  tree  in  which 
the  vertices  are  ground  atoms  such  that 

1.  The  leaves  of  the  tree  are  atoms  appearing  in  the  EDB;  and 

2.  If  an  interior  node  a  has  the  children  6i,  —  b,i,  then  there  is  an  instantiated  rule  in 
the  program  whose  head  is  a  and  whose  body  is  , . . . , 

If  c  is  the  root  of  a  proof  tree,  then  we  say  that  the  proof  tree  establishes  the  fact  c. 

Example  1.5  Assume  that  the  relation  for  6  in  Exami)le  1.3  is  {b(  joe,  bob),  b{ bob,  ann)}. 
The  proof  tree  of  Figure  1.1  establishes  p{joe,  ann).  □ 

Now.  given  any  program  and  database,  the  relation  jjioduced  by  the  program  for  some 
predicate  p  from  this  database  is  precisely  the  set  of  facts  jiin)  that  are  established  by  proof 
trees. 
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1.2.4  Optimizations 

In  typical  database  applications,  the  extensional  database  is  much  larger  than  the  size  of 
a  program  that  manipulates  it.  Hence,  the  preferred  optimization  techniques  are  data- 
independent.  Two  programs  are  said  to  be  equivalent  if  they  generate  the  same  relations 
for  every  predicate  from  every  extensional  database;  the  optimizations  that  we  consider  are 
based  on  program  equivalence,  and  are  hence  independent  of  any  particular  database.  A  va¬ 
riety  of  efficient  query  evaluation  techniques  have  been  proposed  (see  [7], [36]  for  overviews) 
and  these  techniques  vary  in  application  domain  and  efficiency.  In  this  report,  we  investi¬ 
gate  opportunities  for  transforming  programs  into  equivalent  programs  for  which  efficient 
query  evaluation  techniques  become  available.  The  following  examples  illustrate  optimiza¬ 
tions  that  may  be  performed  on  the  closure  programs  of  the  preceding  examples;  these 
optimizations  will  serve  as  canonical  examples  for  three  optimization  problems  that  we  will 
consider  throughout  this  dissertation. 

Example  1.6  Consider  the  symmetric  closure  program  (say,  V)  of  Example  1.2.  We  may 
think  of  the  basis  relation  b  as  the  edge  relation  in  a  directed  graph;  that  is,  b{  iL,v)  is  true 
precisely  when  there  is  an  edge  u  v  in  the  graph.  Then,  the  symmetric  closure  of  the 
graph  may  be  obtained  by  adding  the  edge  v  ^  u  to  the  graph,  where  u  v  is  any  edge 
in  the  original  graph.  That  is,  the  program  of  Example  1.2  is  equivalent  to  the  following 
nonrecursive  program  Q. 

:  p{X.Y)  6(7, A'). 

/2  :  p{X\Y)  b{X\Y). 

The  program  Q  is  obtained  from  the  program  V  by  replacing  the  recursive  subgoal  p(Y.X ) 
in  the  body  of  rule  by  the  nonrecursive  subgoal  6(7,  A").  The  gains  of  such  a  replacement 
stem  from  the  elimination  of  recursion  overhead  in  evaluating  the  program  with  respect  to 
a  database;  that  is,  we  may  use  a  query  evaluator  that  is  specific  to  nonrecursive  programs. 
□ 


Example  1.7  Consider  the  bilinear  program  V  of  Example  1.3,  computing  the  transitive' 
closure  of  the  basis  predicate  6.  It  is  a  well-known  fact  that  V  is  equivalent  to  the  following 
linear  recursive  program  Q. 

r'  :  KA^7)-6(A,^0,P(6^7). 
r^:  p(A,7)  6(A,7). 

The  program  Q  is  obtained  from  V  by  replacing  the  first  recursive  atom  in  the  body  of 
p{X  ,  U  ),  with  the  nonrecursive  atom  6(  A\  U).  Xn  intuitive  understanding  of  the  eqiiivalencf' 
is  as  follows.  Assume  that  6  stands  for  the  “parent"  relation,  and  that  p  is  the  ‘‘ancestoi" 
relation,  as  we  have  previously  discussed.  TIkmi.  tin'  recursive  rule  in  V  says  that  ‘T  is  A  > 
ancestor  if  A  has  some  ancestor  U  whose  anc^'^tt)r  is  7  *.  and  the  recursive  rule  in  Q  sav'> 
that  “1  is  A^’s  ancestor  if  A'  has  some  parent  I'  \vlh»>o  ancestor  is  Y The  gains  of  replacing 
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V  by  Q  is  that  fast  query  evaluators  that  are  specific  to  linear  recursions  become  available 
for  application  to  the  linear  program  Q.  Specifically,  assume  that  we  wish  to  answer  the 
query  p(joe,y')?;  that  is,  “who  are  the  ancestors  of  joe”.  Then,  the  “magic  sets”  technique 
([36])  applied  to  V  generates  the  entire  p  relation,  and  takes  time  that  is  quadratic  in  the 
size  of  b  (in  the  worst  case);  however,  “right-linear  evaluation”  ([36])  computes  the  answer 
in  time  that  is  linear  in  the  size  of  6  A  proof  of  this  claim  is  contained  in  [36].  □ 

Example  1.8  Finally,  consider  the  program  V  of  Example  1.4,  computing  the  symmetric, 
transitive  closure  of  b.  Again,  we  may  think  of  6  as  the  edge  relation  in  a  directed  graph; 
then,  p{u,v)  is  true  iff  there  is  a  path  from  u  to  v  in  the  graph,  obtained  by  following  edges 
in  a  forward  or  backward  manner  (in  a  mi.xed  fashion).  A  little  thought  should  suffice  to 
convince  the  reader  that  p  may  be  computed  by 

1.  First,  computing  the  symmetric  closure  of  the  graph. 

2.  Then,  computing  the  transitive  closure  of  the  result. 

That  is,  the  program  V  is  equivalent  to  the  following  program,  where  the  new  predicate  q 
is  the  symmetric  closure  of  b. 

n  :  p{X,Y)  piX,U),p(UAl. 

3,  :  p{X,Y)  q{X,Y). 

r'  :  c/(A',y)  cy(V',A'). 

r'  :  q(X,Y)  b(X\Y). 

Now,  since  p  is  defined  only  by  rules  rq  and  .si,  and  since  q  is  defined  by  rules  and  J3,  we 
may  apply  the  optimizations  of  the  previous  examples  to  create  the  equivalent  program 

rl  ;  p{X,Y)  q{X\Ulp{CLY). 

:  p{X,Y)  q(X,Y). 

r"  :  q{X,Y)  b{Y,X). 

r^:  q(X,Y):-b{X,Y). 

Finally,  since  q  is  now  defined  only  by  the  nonrecursive  rules  rj  and  73,  we  may  substit>ite 
these  rules  for  the  predicate  q  to  obtain  the  linear  recursive  program 

/•":  p(.V,y)  b{X,U),p{U,Y). 

p{X,Y)  b{U,X),p{U,Y). 
s\  :  piX,Y)  b{Y,X). 

.s'/:  ;;(A',y)  b{X,Y). 

The  gains  of  such  a  replacement  are  again  obtained  from  the  use  of  a  query  evaluator  that 
is  specific  to  linear  recursions.  That  is.  magic  sets  takes  cpiadratic  time  (in  the  size  off))  to 
evaluate  p{joe.Y)l  with  respect  to  the  original  program,  but  right-linear  evaluation  takes 
linear  time  when  apphed  to  the  linear  recursive  program  above.  □ 

■Right-linear  evaluation  may  be  extended  to  the  nonlinear  I  raiisil  ive-elosnre  program,  but  does  not  extend 
to  arbitrary  nonlinear  programs. 
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1.3  Top-down  expansions 

Recall  that  the  facts  generated  by  a  program  from  a  database  are  precisely  those  facts  that 
are  established  by  proof  trees.  A  top-down  expansion  is  obtained  by  lifting  the  arguments 
of  a  piece  of  a  proof  tree  to  variables;  that  is,  the  top-down  expansion  specifies  the  relation 
between  the  leaves  and  the  root  of  a  proof-tree  piece.  Alternatively,  one  may  think  of  a 
top-down  expansion  as  a  state  in  a  top-down  query  evaluation. 

Definition  1.1  Consider  a  (not  necessarily  function-free)  program  V  with  rules  7^. _ 

and  let  p  be  any  intensional  predicate  defined  by  V.  Assume  that  p  has  arity  k.  A  top-down 
expansion  of  p  by  V  is  defined  inductively,  as  follows. 

1.  The  tree  with  root  p(A'i, . . . ,  and  leaf  p(A'i, . . . ,  AT),  where  AT, ,  AT  are  dis¬ 
tinct  variables,  is  a  top-down  expansion  of  p  by  V.  By  convention,  this  top-down 
expansion  is  said  to  have  depth  0. 

2.  Let  vi  be  a  rule 

h  -  B. 

in  which  the  rule  head  h  has  principal  functor  p.  The  tree  T  with  root  li  and  leaves 
B  (in  which  the  order  of  subgoals  in  B  is  preserved)  is  a  top-down  expansion  of  p  by 
V.  The  depth  of  this  top-down  expansion  is  1. 

3.  Consider  any  top-down  expansion  T  of  p.  Assume  that  q{Z)  is  the  /th  leaf  in  T.  Let 
R  be  a  top-down  expansion  of  depth  1,  in  which  all  variables  have  been  renamed  to 
make  them  distinct  from  the  variables  in  T,  and  let  r  be  the  mgu  of  (j{Z)  with  the 
root  of  R.  Let  S  be  the  expansion  ii,  in  which  each  variable  T  appearing  in  the  root 
of  R  is  replaced  by  r(y)  throughout  R.  Replace  every  variable  in  T  that  appears 
in  the  root  of  5  by  ^(y),  and  replace  the  /th  leaf  in  the  result  by  the  subtree  S.  The 
result  is  a  top-down  expansion  of  p  by  7^;  the  depth  of  the  expansion  is  the  depth  of 
the  resulting  tree. 

□ 


Recall  that  a  variable  appearing  in  a  rule  is  distinguished  if  it  appears  in  the  head  of 
the  rule,  and  nondistinguished  otherwise.  The  reason  that  nondistingiiished  variables  arc 
renamed  at  each  stage  of  the  expansion  is  that  these  variables  are  implicitly  existentially 
quantified  in  the  body  of  the  rule.  Distinguished  variables  are  renamed  only  to  make  them 
distinct  from  variables  (perhaps  nondistinguished)  in  the  parent  tree. 

Note  that  the  definition  of  a  top-down  expansion  requires  that  subgoals  in  a  rule  are 
written  in  a  way  that  preserves  the  order  of  the  subgoals  in  every  rule.  Throughout  this 
report,  we  will  assume  that  both  proof  trees  and  top-down  expansions  are  written  in  a  way 
that  preserves  the  left-to-right  order  of  the  subgoals  in  every  rule^. 

"This  assumption  refers  to  the  writing  down  of  proof  trees,  ami  ha^^  no  bearing  on  the  order  in  whi(  h 
subgoals  are  evaluated  by  a  ciuery  evaluator. 
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Figure  1.2:  Top-down  expansion. 

Note  also  that  the  construction  of  a  top-down  expansion  is  polynomial  in  the  size  of  the 
expansion  (since  unification  is  polynomial  [36]).  For  Datalog  programs,  this  procedure  may 
easily  be  performed  in  LOGSPACE. 

Example  1,9  Consider  the  program  of  Example  1.3.  Recall  that  this  program  computes  p 
as  the  transitive  closure  of  the  basis  predicate  6:  that  is,  we  may  think  of  b  as  the  parent 
relation  and  p  as  the  onces^or  relation.  The  trees  of  Figure  1.2  are  top-down  expansions  of 
p{X,Y)  using  the  rules  in  this  program. 

Note  that  the  variable  U  is  renamed  at  depth  2  in  the  tree  of  Figure  1.2  (b).  Intuitively, 
the  top-down  expansion  states  that  T  is  an  ancestor  of  A’  if  A'  has  some  parent  U\  fP  has 
some  parent  U  and  U  has  the  parent  Y .  That  is,  V'  is  A'*s  great-grandparent  in  this  case. 
□ 


We  will  also  speak  of  a  top-down  expansion  of  an  atom  p{X):  that  is,  we  will  specify  an 
atom  to  be  unified  with  the  root  of  the  tree.  Let  T  be  a  top-down  expansion  and  p[X)  an 
atom.  Assume  that  this  atom  unifies  with  the  root  of  7^  under  the  most  general  unifier  r. 
Then,  the  expression  t[T)  represents  the  expansion  obtained  by  replacing  every  variable  Y 
in  T  that  appears  in  the  root  by  r(y).  We  say  that  r(T)  is  a  top-doxm  expansion  of  p(X) 
by  V. 

Example  1.10  Consider  the  program 

r,  :  p{J{X)):^q{X,UY 
7’2  :  q(X,X)  6(A). 

Figure  1.3  (a)  shows  a  top-down  expansion  of  /;  with  depth  2,  and  Figure  1.3  (b)  exhibits 
a  top-down  expansion  of  p[f[J[A)). 

In  Figure  1.3  (a),  the  variable  A’  in  rule  r-i  has  brow  renamed  to  the  variable  F.  □ 
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Figure  1.3:  Top-down  expansion  with  specified  root. 

1.3.1  Conjunctive  queries 

A  single-rule,  nonrecursive  program  is  called  a  conjunctive  query  ([10]).  That  is,  a  conjunc¬ 
tive  query  is  of  the  form 

piX)  B. 

where  B  is  a  conjunction  of  atomic  formulae  all  of  whose  principal  functors  are  EDB  pred¬ 
icate  names.  Each  top-down  expansion  with  EDB  predicates  at  the  leaves  defines  a  con¬ 
junctive  query  in  a  natural  way;  that  is,  if  T  is  such  a  top-down  expansion,  then  the 
corresponding  conjunctive  query  is 

h  B. 

where  h  is  the  root  of  T,  and  where  B  is  the  conjunction  of  the  fringe  of  T. 

Example  1.11  The  top-down  expansion  of  Figure  1.2  (b)  represents  the  conjunctive  ciuery 

p(X,Y)  biX,U'lb{U',U),b{U,\V 

□ 

We  will  also  speak  of  top-down  expansions  with  IDB  predicates  at  the  leaves  as  con¬ 
junctive  queries.  The  intention,  in  this  case,  is  that  the  conjunctive  query  is  applied  non  re¬ 
cursively. 

Example  1.12  The  top-down  expansion  of  Figure  1.2  (a)  represents  the  conjunctive  iiuory 

p{X,Y)  i,{X.[-),p(V\Y). 

The  idea  is  that  this  conjunctive  query  reprc's<>nts  all  ways  in  which  two/^-facts  (presumably 
generated  by  the  program)  may  be  combined  to  |)roduce  a  new  ;>fact  using  the  top-down 
expansion.  □ 

^This  conjunctive  query  is  often  written  {.V>'  !  In  \ .  I  h{  .  U).h{^'<y)}\  however,  we  will  use  the 
notation  illustrated  above  for  the  purposes  of  uuifoiimi  v 
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We  may  now  view  a  program  as  the  (possibly  infinite)  union  of  the  conjunctive  queries 
that  it  generates,  such  that  the  bodies  of  the  conjunctive  queries  are  occupied  by  only 
extensional  predicates.  In  fact,  we  will  also  speak  of  an  arbitrary  union  of  conjunctive 
queries  as  a  “program”. 

1.3.2  Containment  and  equivalence 

Let  V  and  Q  be  programs  defining  the  predicate  p  (perhaps  among  other  IDB  predicates). 
We  say  that  V  is  contained  in  Q  with  respect  to  p  (written  V  Cp  Q,  or  just  V  C  Q  il  p 
is  understood)  iff  for  every  database,  the  relation  for  p  that  is  produced  by  V  is  a  subset 
of  that  produced  by  Q.  The  programs  V  and  Q  are  said  to  be  equivalent  with  respect  to  p 
(written  V  =p  Q)  iflV  CpQ  and  Q  Cp  V\  as  before,  we  omit  all  references  to  the  predicate 
p  if  this  predicate  is  understood  from  the  context. 

Since  a  conjunctive  query  is  a  single- rule,  nonrecursive  program,  these  definitions  apply 
equally  to  such  queries.  However,  for  the  purpose  of  deciding  containment  among  con¬ 
junctive  queries,  we  need  make  no  reference  to  the  predicates  being  defined  by  the  queries 
because  each  conjunctive  query  defines  a  unique  predicate.  That  is.  if  the  heads  of  two 
conjunctive  queries  are  labelled  by  two  different  predicates,  then  there  is  no  containment; 
otherwise,  the  containment  is  defined  with  respect  to  the  common  predicate  defined  by  the 
queries. 

Chandra,  and  Merlin  ([10])  have  proposed  a  syntactic  test  for  the  containment  of  one 
conjunctive  query  in  another.  Consider  the  conjunctive  queries 

Cl  :  piX)  B,. 

C2  :  qiY)  Bo.. 

where  Bi  and  B^  are  conjunctions.  Let  /  be  a  function  on  the  variables  in  C-z-  We  extend  / 
to  all  symbols  in  C2  by  requiring  /  to  be  the  identity  on  constants.  Finally,  we  may  extend 
/  to  terms  (and  atomic  formulae)  in  the  obvious  way;  that  is,  we  define  /(ry(di. . . .  .f4)) 
to  be  fiq)(  f{di),.  ..,fidk)).  We  say  that  /  is  a  containment  inappimj  from  C2  into  C'l 
(written  /  :  C2  ^  Ci)  iff  the  following  are  true. 

1-  mf))  =  Pin 

2.  For  each  atom  t  in  B2,  the  atom  f{t)  appears  in  fij. 

For  any  atom  in  C2,  f{t)  is  termed  the  destination  of  /  under  /. 

The  value  of  containment  mappings  is  illustrated  in  the  following  theorem  of  Chandra 
and  Merlin  ([10]). 

Theorem  1.1  Containment  mapping  theorem  (Chandra  and  Merlin). 

For  any  conjunctive  queries  Ci  and  C2,  Ci  C  C2  iff  there  is  a  containment  mapping  /  : 

C2-C1.  □ 

In  Chapter  2  of  this  dissertation,  we  will  present  a  dual  to  the  containment  mapping, 
the  conjunct  mapping,  which  is  also  a  necessary  and  >ulli(ieiit  condition  for  the  detection 
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of  conjunctive  query  containment.  The  value  of  the  concept  of  conjunct  mappings  is  that 
the  concept  permits  of  a  complete  description  of  the  complexity  of  deciding  containment 
among  conjunctive  queries;  the  description  is  also  contained  in  Chapter  2. 

Example  1.13  Consider  the  conjunctive  queries  C\  and  C2,  as  defined  below. 

Cl  :  p{X)  a{X,B),b{A,B),b(C,B),ciB,B).c{A,D). 

C2  :  p{X)  a{X,V),b{U,V),c{U,W). 

The  function  /  defined  by  f{X)  =  X,f{V)  =  B,f{U)  =  A,f{W)  =  D  is  &  containment 
mapping  from  C2  into  Ci-  The  destination  (under  /)  of  p(X)  is  p(  A’),  of  a(A',  V')  is  a(A'.  B). 
oi  b{U,  V)  IS  b{.A,  B)  &nd  of  c{U,W)  IS  ci  A,  D). 

However,  there  is  no  containment  mapping  </  ;  Ci  — ^  €2-  Assume  such  a  g  exists. 
Then,  the  destination  of  c{B,B)  under  g  would  have  to  be  c{U,W)  (the  only  c-atom  in 
C2).  However,  g{c{B,B))  =  c{U,W)  requires  that  g(B)  be  both  U  and  I'F,  contradicting 
the  functionaUty  of  </.  □ 

Now,  we  may  define  the  containment  of  top-down  expansions  (with  IDE  predicates 
permitted  at  the  leaves)  as  the  containment  of  the  corresponding  conjunctive  queries.  Given 
the  relations  between  rules,  top-down  expansions  and  conjunctive  qeuries,  we  will  sometimes 
also  speak  of  the  containment  of  a  top-down  expansion  in  a  rule. 

Finally,  recall  that  we  may  think  of  a  program  as  a  (possibly  infinite)  union  of  conjunctive 
queries,  where  the  bodies  of  these  queries  are  atomic  formulae  whose  principal  functors  are 
EDB  predicates.  Sagiv  and  Yanuakakis  ([29])  use  this  idea  to  reduce  program  containment 
to  the  containment  of  conjunctive  queries.  The  theorem  of  Sagiv  and  Yannakakis  says  that, 
for  a  program  V  to  be  contained  in  a  program  Q,  each  conjunctive  query  generated  by 
V  must  be  contained  in  some  conjunctive  query  generated  by  Q;  that  is,  there  can  be  no 
“mixing  and  matching”. 

Theorem  1.2  (Sagiv  and  Yannakakis). 

Consider  programs  V  and  Q,  defining  the  IDE  predicate  p.  Then,  V  Cp  Q  ifi  for  every 
conjunctive  query  Cj?  generated  by  V  with  EDE  predicates  at  the  leaves  and  p  at  the  root, 
there  is  a  conjunctive  query  Cq  generated  by  Q  with  EDB  predicates  at  the  leaves  such 
that  C'p  C  Cq.  □ 

1.3.3  Tree  shapes 

As  we  will  show  in  the  next  section,  the  optimizations  of  the  previous  section  can  be 
described  in  terms  of  normal-form  proof  trees.  That  is,  each  optimization  is  made  possible 
because  aU  facts  generated  by  the  relevant  program  are  generated  by  proof  trees  of  a  certain 
“shape”. 

Recall  that  proof  trees  (and  top-down  expansions)  are  written  in  a  way  that  preserves 
the  left-to-right  order  of  the  subgoals  in  any  rule.  The  idea  of  a  proof-tree  shape  is  the 
intuitively  obvious  one,  given  this  assumption.  Following  Helm  ([13]),  we  may  formalise  the 
concept  by  defining  a  shape  as  a  list  over  the  rule  names. 
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Definition  1.2  Let  P  be  a  program  with  rules  rj, . . . ,  r„.  A  tree  shape  (or  just  shape)  is  a 
list  that  is  defined  as  follows. 

1.  The  empty  list  []  is  a  shape.  This  list  represents  all  top-down  expansions  of  depth  0. 

2.  Let  Ti  be  a  rule  defining  the  predicate  p,  with  intensional  subgoals  (in  order)  ti, . . . ,  ii; 
(that  is,  the  principal  functor  of  each  ti  is  an  intensional  predicate  name).  Then,  the 

list  , 

k 

InM] 

is  a  shape.  This  shape  represents  the  top-down  expansion  of  p  with  depth  1,  in  which 
?•;  is  used  to  construct  the  expansion.  The  {j  +  l)th  component  of  this  list  is  said  to 
represent  the  leaftj. 

.3.  Let  5  be  a  nonempty  shape,  representing  some  top-down  expansion  T  of  some  predi¬ 
cate  p.  Assume  that  the  ith  occurrence  of  the  empty  list  in  5  represents  the  ith  leaf 
qiZ)  in  T.  Assume  further  that  the  head  of  rule  Vj  unifies  with  this  leaf  under  the  mgu 
r,  and  that -rj  has  the  intensional  subgoals  (in  order)  QiiZi), . .  .,qk{Zk)-  Construct 
the  top-down  expansion  T'  by  expanding  the  fth  leaf  (i{Z)  in  T  through  rule  vj.  Let 
S'  be  the  shape  S,  where  the  empty  list  representing  the  ith  leaf  q{Z)  in  T  is  replaced 
bv  the  list 

k 

[r.,  [Ci]] 

Then  S'  is  a  shape,  representing  the  top-down  expansion  T'.  Further,  the  ith  occur¬ 
rence  of  the  empty  list  in  the  above  list  is  said  to  represent  the  /th  leaf  <//(r(Z())  in 
the  newly-added  subtree. 

□ 

It  is  clear  that  every  top-down  expansion  has  a  well-defined  shape.  We  will  also  speak 
of  the  shape  of  a  top-down  expansion  of  a  specified  atom,  and  of  the  shape  of  a  proof  tree. 

Example  1.14  Figure  1.2  contains  top-down  expansions  generated  by  the  program  of 
Example  1.3.  The  expansion  of  Figure  1.2  (a)  has  the  shape  [ri[][]],  and  expansion  (b) 
in  that  figure  has  the  shape  [t’i(?'i[j‘2][^2]][^2]]-  Similarly,  Figure  1.3  exhibits  top-down 
expansions  generated  by  the  program  of  Example  1.10.  Figure  1.3  (a)  contains  a  top-down 
expansion  with  the  shape  [ri[r2]],  a.nd  part  (b)  of  that  figure  contains  a  top-down  expansion 
with  the  same  shape,  but  with  the  specified  root  p{f{f{A))).  □ 

The  structure  of  a  shape  can  be  simplified  if  the  program  in  question  is  linear  (recall 
that  a  program  is  linear  if  at  most  one  atom  in  the  body  of  any  rule  is  intensional).  In  the 
linear  case,  each  top-down  expansion  may  be  represented  as  a  string  over  the  rule  names, 
and  sets  of  top-down  expansions  may  be  represented  by  regular  expressions  over  the  rule 
names  ([25]). 
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Example  1.15  Consider  the  linear  logic  program  of  Example  1.10.  The  top-down  expan¬ 
sions  of  Figure  1.3  have  the  shape  rir2;  note  that  the  order  in  which  the  rule  names  appear 
in  the  shape  corresponds  to  the  order  in  which  rules  are  applied  in  a  top-down  fashion, 
starting  with  the  root.  The  set  of  all  top-down  expansions  generated  by  the  program,  with 
EDB  predicates  at  the  leaves,  may  be  denoted  by  the  regular  expression  7‘|r2.  □ 

Extending  functions 

We  formalise  our  earlier  notions  of  the  extensions  of  functions  on  the  variables  in  an  ex¬ 
pansion,  as  follows.  Let  /  be  any  function  defined  on  some  of  the  variables  in  a  top-down 
expansion  T.  We  extend  /  to  all  variables,  and  to  constants,  by  requiring  /  to  be  the 
identity  on  all  variables  on  which  it  is  not  defined,  and  on  all  constants.  Similarly,  for  any 
term,  we  define  f{q(di,,..,dk))  to  be  f{q){f(di)^,..,f{dk)).  Finally,  we  define  /(T)  to  be 
the  result  of  replacing  every  atom  a  in  T  by  /(a).  The  idea,  as  before,  is  that  we  merely 
replace  every  occurrence  of  every  variable  on  which  /  is  defined  by  its  image  under  /.  In  the 
remainder  of  this  chapter,  we  will  assume  that  all  functions  have  been  extended  as  described 
above. 

Labels 

Consider  any  sequence  Ti, . . .  ,Tn  of  top-down  expansions,  not  necessarily  of  different  shapes. 
We  will  assume  that  each  node  in  each  expansion  is  labelled  to  be  distinct  from  all  other 
nodes  in  the  expansion,  and  from  all  nodes  in  every  other  expansion.  Given  a  label  /.  the 
node  to  which  /  refers  will  be  written  node{l).  We  assume  that  labels  are  preserved  through 
the  application  of  functions  to  a  top-down  expansion,  as  described  above.  Now,  given  any 
containment  mapping  f  :  T2  Ti  where  Ti  and  T2  are  labelled  top-down  expansions,  we 
assume  the  .selection  of  a  labelled  destination  for  each  labelled  node  in  T2  under  /;  that  is, 
for  each  node  node{k)  that  is  a  leaf  or  the  root  in  T2,  we  select  a  unique  label  /  such  that 
f{node{k))  =  node{l). 

Uniqueness  of  shapes 

It  is  easily  seen  that  every  shape  uniquely  defines  a  top-down  expansion,  up  to  the  renaming 
of  variables.  The  idea  is  that  the  top-down  application  of  rules  ‘'commute”,  and  we  may 
therefore  construct  a  top-down  expansion  of  a  given  shape  using  any  order  of  rule  applica¬ 
tions.  More  formally,  let  Ti,  T2  and  T3  be  top-down  expansions  in  which  all  variables  in 
each  expansion  are  renamed  to  be  distinct,  and  where  T2  and  T3  have  depth  1  (i.e.  each 
represents  the  application  of  a  single  rule).  Let  nod€{l)  and  node(k)  be  distinct  leaves  in 
Tj.  Assume  that  we  expand  the  leaf  node{l)  in  T\  through  T2  under  any  unifier  and 
expand  node[k)  in  the  result  through  under  any  unifier  T2  to  create  .some  expansion  5 
(see  Figure  1.4). 

Then,  we  may  create  S  in  the  following  wav. 

1.  Expand  node(k)  in  T\  through  using  the  unifier  02  =  T2{ti  )  (recaU  that  all  functions 
are  extended  to  be  the  identity  on  variables  on  which  they  are  not  defined). 
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Figure  1.4:  Changing  the  order  of  expansion. 
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2.  Define  the  unification  (Ji  to  be  the  identity  on  variables  in  Ti  and  T3,  and  to  be  r2(ri ) 
on  the  variables  in  7^2.  Expand  node{l)  in  the  result  of  1  above  through  T2  using  the 
unification  <72. 

That  is,  the  unifications  cti  and  ao  permit  us  to  construct  the  same  expansion  S  by  ap¬ 
plying  the  expansions  T2  and  T3  to  node[l)  and  nod€{k)  in  reverse  order.  Hence,  if  we 
use  the  mgu  at  each  stage,  then  both  orders  of  applications  must  yield  equally  general 
top-down  expansions.  An  induction  on  n  allows  us  to  reach  the  same  conclusion  if  the 
leaves  nor/e(  ),...,  nor/e(A;n)  are  expanded  through  the  depth-1  expansions  re¬ 

spectively.  A  straightforward  induction  on  the  size  of  a  shape  permits  us  to  claim  that 
every  shape  uniquely  defines  a  top-down  expansion,  up  to  renaming  of  variables.  Hence,  we 
may  abuse  notation  by  speaking  of  the  containment  of  a  shape  (or  top-down  expansion)  in 
a  shape. 

Note  that  the  uniqueness  of  shapes  allows  us  to  construct  a  top-down  expansion  from 
top-down  expansions  Ti,  T2  and  ^3  of  arbitrary  depth,  by  renaming  the  variables  of  the 
expansions  apart,  expanding  some  leaf  in  Ti  through  T2  using  the  mgu  of  the  leaf  in  Ti 
with  the  root  of  T2,  and  similarly  expanding  any  leaf  in  the  result  through  T3. 

1.3.4  Changing  shapes 

We  present  some  results  that  will  be  of  use  in  the  following  sections.  The  idea  is  that 
containment  mappings  between  top-down  expansions  remain  essentially  unchanged  under 
certain  operations  that  are  performed  on  the  expansions  in  question,  and  expansions  can 
therefore  be  robustly  manipulated  to  change  their  shapes. 

Replicating  expansions 

Let  Ti,...,rn  be  a  sequence  of  expansions,  not  necessarily  of  different  shapes.  Assume 
that  the  label  of  each  node  in  Ti  is  a  list  of  the  form  [i,/],  where  /  is  an  integer  unique  to 
the  relevant  node  in  T,*.  We  will  speak  of  i^eplicating  some  tree  Ti  in  the  above  sequence. 
To  create  k  copies  Tn^, ,  ,^Ti^  of  we  construct  k  /'ename  functions  rerii, . . . .  reuf,  that 
rename  each  variable  in  T,  to  be  distinct  from  any  variables  in  Ti,...,rn  (or  their  copies, 
if  any),  such  that  the  ranges  of  these  rename  functions  are  distinct.  Then,  for  each  j.  'J\, 
is  renj{Ti),  Further,  the  labels  of  each  copy  of  7",  are  propagated  by  setting  the  label  of 
nod€{m)  in  Tij  to  [i|m]. 

Theorem  1.3  Expansion  Theorem 

Let  Ti  and  Ta  be  top-down  expansions  witl\  distinct  variables,  and  let  T2  and  T4  be  exj)aii 
sions  with  distinct  variables.  Assume  that  f  :  To  —  Ti  and  g  :  T4  T3  are  containment 
mappings.  Assume  that  R  is  obtained  by  expanding  the  leaf  node{l)  in  T\  througli  1'^. 
Let  node{ki)^ . . ,  ^  node{kn)  be  leaves  in  T2  such  that  f(  node(ki))  =  noc/e(/),  and  let  S  lx* 
the  result  of  expanding  each  leaf  node{ki)  in  /'_>  through  the  it\\  replica  T^i  of  T4.  Tlum. 
there  is  a  containment  mapping  h  :  5  — ^  7?  that  |)reserves  the  destinations  of  /  and  g:  tliai 
is,  if  /(node(c))  =  node(d)  and  node(c)  is  a  leaf  in  .S’,  then  h{node{c))  =  node{d.)\  and.  if 
g(node{c))  =  node{d),  then  li(node([i\c])  =  nmh  \  d]  for  all  i.  Figure  1.5  depicts  this  claim. 
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Figure  1.5;  The  Expansion  Theorem. 


Proof.  Let  the  roots  of  Ti^T^.Tz  and  T4  be  ri,r2,r3  and  7-4  respectively.  Similarly,  let 
the  root  of  T.ij  be  r4j  for  all  j.  Let  t  be  the  most  general  unifier  of  node{l)  with  the  root 
7-3  of  T3,  and  a  the  most  general  unifier  of  the  sequence  7'4i..  ..,7‘4„  with  the  sequence  ol 
leaves  node{ki), .  ..,node(kn)  in  T2.  Since  g  is  a  containment  mapping  from  T4  into  T3,  the 
domain-disjoint  functions  gj  =  g{renj~^)  are  containment  mappings  from  T^j  into  Tz  that 
preserve  the  destinations  of  g.  Define  the  function  p  as  follows. 


p[V)  = 


for  ia  T2 

T{gj{V)  for  V  in  T^j 


Now,  p  is  a  unification  under  which  the  leaves  node{ki). . . . ,  node{kn)  may  be  expanded 
through  the  subtrees  r4i,. .  .,T4n  respectively  (to  create  some  expansion  5'.  say).  Consider 
any  leaf  node{m)  in  T2]  assume  that  the  destination  of  node(m)  under  /  is  node(i)  in  T\. 
Then,  node{m)  in  S'  is  syntactically  identical  to  node{i)  in  R.  Similarly,  consider  any  leaf 
node{m.)  in  T^:  assume  that  the  destination  of  node(in)  under  g  is  node(i)  in  T3.  Then  for 
each  j,  node[[j\m])  in  S'  is  syntactically  identical  to  node(i)  in  R.  That  is.  the  identity  / 
is  a  containment  mapping  from  S'  into  R  that  preserves  the  destinations  of  /  and  g.  If  a  is 
the  most  general  unifier  of  node{k-i), ....  node(kn)  with  the  roots  r4i  ofT4,,....r4„. 

then  by  the  properties  of  the  most  general  unifier  there  is  some  function  li  such  that  p  =  h{a ); 
hence,  h  is  a  containment  mapping  from  S  into  S'  such  that  h(node(i))  =  node(i)  for  all 
nodes  in  5,  and  our  result  follows  by  composing  h  and  /.  □ 
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Figure  1.6:  Assumption  for  the  Splicing  Theorem. 

Theorem  1.4  Splicing  Theorem 

Let  Ti,  T2,  T3  and  T4  be  top-down  expansions  with  distinct  variables.  Assume  that  node^l) 
is  a  leaf  in  T\  and  node{m)  is  a  leaf  in  T21  and  that  there  is  a  containment  mapping 
f  :  T4  T2.  Let  node{ki)^ . . node{kn)  be  the  leaves  in  T4  whose  destination  under  /  is 
noc/e(m)  (see  Figure  1.6).  Construct  the  expansion  R  by  expanding  nod€(l)  in  Ti  through 
r2,  and  expanding  nod€{in)  in  the  result  through  T3.  Construct  S  by  expanding  node(l)  in 
Ti  through  T4^  and  expanding  each  leaf  node(kj)  in  the  result  through  the  jth  replica  T3j 
of  T3.  Then,  there  is  a  containment  mapping  h  from  S  into  R  snch  that  the  destination  of 
each  leaf  node{i)  in  S  is  as  foUows. 

1.  If  node{i)  is  a  node  in  7’i,  then  h{node{i))  =  node(i). 

2.  If  node(i)  is  a  node  in  T4,  then  h{node(i))  =  f[node{i). 

3.  If  i  is  of  the  form  [p|3],  then  h{nodeii))  =  node{p). 

Figure  1.7  depicts  the  idea. 

Proof.  The  identity  is  a  containment  mapping  from  Ti  into  Tj ,  and  the  function  (jj  =  renj~' 
is  a  containment  mapping  from  the  jt\\  replica  of  T3  into  Ts;  our  result  follows  by  two 
applications  of  the  Expansion  Theorem.  □ 

1.4  Subtree  eliminations 

1.4.1  Normal-form  optimizations 

In  this  section,  we  will  present  a  variety  of  optimizations  that  are  defined  in  terms  of  normal- 
Jorni  conjunctive  queries.  The  domain  on  which  we  will  define  these  optimizations  is  that 
of  single-IDB  programs;  that  is,  programs  in  which  tliere  is  only  one  intensional  predicate. 
p.  We  do  not  assume  that  the  programs  are  either  safe  or  function-free.  The  following 
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Figure  1.7:  Illustrating  the  Splicing  Theorem. 
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Figure  1.8:  A  one-bounded  expansion. 

program  V  will  serve  as  the  canonical  example  of  a  single-IDB  program;  the  IDB  predicate 
defined  by  the  program  is  the  predicate  p. 

Let  V  consist  of  the  7i  recursive  rules 

»'i  :  piXio)  p(Aii),. .  .p(-Yuj),Ci. 

J-.  :  PiXio)  p{Xii),...p{Xik,),Ci- 

^'n  •  piXjio)  •“  P( -Ynl  );  •  •  •  p(  Anfc„)?  Cn. 
and  the  m  nonrecursive  rules 


bi  :  p[\\o)  V,. 

bj  :  p{Yjo)  :-Vj. 

bm  ■  P(F»7io)  •—  I^m- 

where  the  Ci  and  Vj  are  arbitrary  conjunctions  of  EDB  predicates.  Examples  1.1  -  1.4 
exhibit  such  programs. 

Let  us  define  a  top-down  expansion  to  be  closed  if  all  its  leaves  are  EDB  predicates. 
1.4.2  One-b.oundedness:  definition  and  results 

Define  a  top-down  expansion  to  be  one-bouodf  dU  at  most  the  root  is  expanded  through  a 
recursive  rule;  that  is,  if  the  top-down  expansion  has  depth  at  most  2  (see  Figure  1.8).  The 
program  V  is  said  to  be  one-boxinded  iff  one-l)oundedness  is  a  normal  form  for  the  proof 
trees  generated  by  the  program;  that  is,  iff  every  closed  conjunctive  query  generated  by  V 
is  contained  in  some  closed  one-bounded  conjunctive  query. 

If 'P  is  one-bounded,  then  V  can  be  convrrtod  into  an  equivalent  nonrecursive  program 
Q  by  expanding  each  recursive  atom  in  eacli  rub'  through  each  nonrecursive  rule.  That  is. 
the  equivalent  nonrecursive  program  Q  is  con^i  ructed  as  follows. 
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Let  g  be  a  new  predicate  symbol.  Construct  the  program  Q,  with  n  +  m  +  1  rules 
{?■•  I  1  <  '<  <  n}  U  {b'i  1  1  <  t  <  m}  U  {c},  as  follows.  If  is  a  recursive  rule,  then  replace 
?•,  by  the  rule 

r'i  :  p{Xio)  q{Xii),. . .  ,q{Xik,),Ci. 

That  is,  we  replace  all  recursive  occurrences  of  p  in  r,  with  a  corresponding  occurrence  of 
(J- 

Next,  we  introduce  the  rule  c,  which  merely  initialises  p  to  q: 
c:  piVu...,Vk)  g(Vi, . . . ,  V'^). 

The  Fs  are  distinct  variables.  Finally,  each  nonrecursive  rule  6,  is  replaced  by  the  rule 
6'  :  g(y.o)  V,. 

which  initialises  q  using  the  nonrecursive  rules  for  p.  Assume  V  is  one-bounded;  then,  the 
closed  top-down  expansions  of  V  are  isomorphic  to  the  closed  top-down  expansions  of  Q: 
that  is,  each  such  top-down  expansion  generated  by  either  program  can  also  be  generated 
by  the  other  program.  The  equivalence  of  V  and  Q  then  follows  by  the  theorem  of  Sagiv 
and  Yannakakis  (Theorem  1.2). 

Example  1.16  Consider  the  program  V  of  Example  1.2.  The  program  Q  is  the  program 

r'  :  p{X,Y)  q{Y,X). 
c:  p(A',y):-g(A-,n. 

:g(A-,y)  6(A',y). 

The  intermediate  predicate  q  can,  in  this  case,  be  eliminated  to  obtain  the  linear  program 
Q'  of  Example  1.6,  repeated  below. 

ri  :  p(X,y)  b{Y,X) 

T2-p{X,Y)  6(A',y). 

The  closed  one-bounded  expansions  of  V  are  easily  seen  to  be  isomorphic  to  the  closed 
expansions  of  C',  as  indicated  by  Figure  1.9.  □ 

The  gains  of  converting  V  into  Q  in  this  way  iire  ol)tained  from  the  elimination  of 
recursion  overhead  in  query  processing. 


1.4.  SUBTREE  ELIMINATIONS 


21 


P{X,Y) 

P{X,Y) 

I 

P{Y,X) 

1 

b{X,Y) 

1 

PiX.Y) 

PiX,Y) 

j 

(a) 

(b) 

piXM 

p{X,Y) 

j 

p(y!A') 

1 

1 

b{Y,X) 

p(A',y) 

P(y,-Y) 

bJ.X) 

(<^)  (d) 

Expansions  of  V  Expansions  of  Q' 


Figure  1.9:  Illustrating  Example  1.16. 


22 


CHAPTER  1.  SUBTREE  ELIMINATIONS 


Results 

One-boundedness  is  easily  seen  to  be  decidable;  we  sketch  a  proof  in  Section  l.o.  The 
complexity  of  one-boundedness  has  been  investigated  by  Kanellakis  ([20])  for  a  restricted 
class  of  programs.  A  sirup  is  a  single-IDB  ,  Datalog  program  with  a  single  recursive  rule, 
and  a  basis  rule  of  the  form 


where  the  A's  are  distinct  variables,  and  where  the  EDB  predicate  b  appears  nowheie  else 
in  the  program.  Kanellakis  ([20])  has  shown  that  one-boundedness  in  MV-lmri  for  linear 
sirups:  that  is,  for  programs  that  in  addition  to  the  basis  rule  above  contain  the  recuisive 

rule 

p{Y)  p{Z),er{W,),...,en{Wn). 

Ka.nellakis’  result,  however,  assumes  an  unbounded  number  of  repetitions  of  EDB  predicates 
in  the  body  of  the  recursive  rule.  In  Chapter  2,  we  will  show  that  one-boundedness  is  j\JV- 
hard  even  if  there  are  no  more  than  4  repetitions  of  any  EDB  predicate  in  the  body  of 
the  recursive,  rule.  In  the  same  chapter,  we  will  show  that  one-boundedness  is  decidable  in 
polynomial  time  for  linear  sirups  in  which  no  EDB  predicate  is  repeated  in  the  body  of  the 
recursive  rule.  Finally,  in  Section  1.5,  we  will  present  a  polynomial- time  algorithm  that  is 
sufficient  (but  not  necessary)  to  detect  one-boundedness  in  arbitrary  single-IDB  programs; 
the  idea  is  a  reduction  to  the  decision  procedure  for  linear  sirups. 


1.4.3  Base-case  linearizability:  definition  and  results 

Recall  our  running  assumption  that  top-down  expansions  are  written  in  a  way  Uiat  preserves 
the  left-to-right  order  of  the  subgoals  in  every  rule.  Define  a  top-down  expansion  generated 
by  V  to  be  right-linear  if  only  the  rightmost  occurrence  of  p  is  ever  recursively  expanded  in 
the  expansion  (see  Figure  1.10).  The  prograhi  V  is  said  to  be  linearizable  by  basis  right- 
linearity  is  a  normal  form  for  the  proof  trees  generated  by  the  program;  that  is,  iff  every 
closed  conjunctive  query  generated  by  V  is  contained  in  some  closed  right-Unear  conjunctive 

query. 

If  P  is  basis-linearizable,  then  P  can  be  converted  into  an  equivalent  linear  iecuisi\e 
program,  as  follows. 

Let  9  be  a  new  predicate  symbol.  Construct  the  program  Q,  with  n  m  +  1  lu  es 
{r-  I  1  <  f  <  n}  U  {6;  1  1  <  f  U  {c},  as  follows.  If  r,  is  a  nonlinear  rule  {k,  >  1), 

then  replace  r,-  by  the  linear  rule 

:  p{Xio)  giXii), - 

That  is.  we  replace  all  but  the  last  recursive  occurrence  of  p  in  r,  with  a  corresponding 
occurrence  of  </.  If  is  linear  [k^  =  1),  then  rj  is  the  same  as 

/•[;  p(-V,o)  /H--Vn)Al,. 
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Figure  1.10:  A  riglit-linear  expansion. 

Next,  we  introduce  the  rule  c,  which  merely  initialises  p  to  q: 
c:  p{Vu...,Vk)  :-90i,...,14). 

The  V's  are  distinct  variables.  Finally,  each  nonrecursive  rule  bi  is  replaced  by  the  rule 

b'i  :  q{y\o)  Vi. 

which  initialises  q  using  the  nonrecursive  rules  for  p. 

If  V  is  basis-linearizable,  then  the  closed  right-linear  top-down  expansions  of  V  are 
isomorphic  to  the  closed  top-down  expansions  of  Q;  that  is,  each  such  top-down  expansion 
generated  by  either  program  can  also  be  generated  by  the  other  program.  The  equivalence 
of  V  and  Q  then  follows  by  the  theorem  of  Sagiv  and  Yannakakis  (Theorem  1.2). 

Example  1.17  Consider  the  program  V  of  Example  1.3.  The  program  Q  is  the  program 

p{X,Y)  q{X,U),piU.y). 
c:  p{X,Y):-  q{X,Y). 
b\:q{X,Y)  b{X,Y). 

The  intermediate  predicate  q  can,  in  this  case,  be  eliminated  to  obtain  the  linear  program 
O'  of  Example  1.7,  repeated  below. 

r':  p{X,Y)  b{X,U),p{U,Y). 
r2:p(.V,r):-  b(X,Y). 


The  closed  right-linear  expansions  of  V  are  easily  seen  to  be  isomorphic  to  the  expansions 
of  O',  as  indicated  by  Figure  1.11.  □ 
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Figure  1.11:  Illustrating  Example  1.17. 

The  gains  of  replacing  V  by  Q  are  obtained  through  the  use  of  special-purpose  query 
evaluators  for  linear  recursive  programs.  In  fact,  Ullman  and  van  Gelder  ([37])  have  shown 
that  the  evaluation  of  programs  with  the  polynomial  fringe  properly  (a  superset  of  linear 
recursive  programs)  may  be  performed  in  AfC. 

Results 

Basis-linearizability  was  proposed  by  Zhang  et  al.  ([40]®),  who  studied  this  property  in 
terms  of  a  bilinear  sirup  with  rectified  rule  heads  (i.e.,  no  variable  is  repeated  in  the  head 
of  any  rule),  such  that  there  is  at  most  one  EDB  subgoal  in  the  body  of  the  recursive  rule. 
That  is,  they  consider  programs  of  the  following  form. 

P(A'i,...,A7.)  p{Y),p{Z),e{W)). 

p(A'i , . . . ,  Xk)  6(  A'l - -  Xk  )• 

They  claim  a  polynomial-time  decision  procedure  for  the  detection  of  basis-linearizability 
for  such  programs,  although  their  proof  has  a  flaw  in  a  key  lemma;  we  will  discuss  the  error 
in  Chapter  3.  In  Chapter  3,  we  will  show  that  basis-linearizability  is  decidable  for  bilinear 
sirups  with  an  unbounded  number  of  EDB  subgoals  in  the  recursive  rule,  as  long  as  the 
EDB  predicates  are  distinct;  that  is,  we  consider  |)rograms  of  the  following  form,  in  which 
the  es  are  distinct. 

p{Xu...,Xk)  p(f),p(2).ei(lF,) . e„(lT„). 

p(A„...,A,)  6(A'i, . V,). 

The  techniques  of  Chapter  2  can  be  used  to  show  that  our  decision  procedure  is  polynomial. 
VVe  will  show  in  Chapter  3  that  the  proof  of  [40]  does  not  directly  extend  to  such  programs, 

^This  paper  has  recently  been  published  in  the  ACM  nous  on  Database  Systems,  but  the  proof 

has  been  omitted. 
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and  in  fact  vve  also  cover  (as  they  do  not)  the  case  in  which  tl\e  program  can  be  linearized 
in  a  different  way;  however,  our  proof  has  been  motivated  in  large  part  by  their  treatment. 

Ramakrishnan  et  al.  ([25])  have, shown  that  the  decision  procedure  of  Chapter  3  does 
not  extend  to  the  case  in  which  EDB  repetitions  are  permitted  in  the  body  of  the  recursive 
rule,  and  show  that  detecting  basis-linearizability  is  ,V7^-hard  in  this  case;  however,  their 
reduction  involves  an  unbounded  number  of  repetitions  of  EDB  predicates  in  the  recursive 
rule.  In  Chapter  2,  we  show  that  the  detection  of  basis*linearizability  is  P-hard  for  bilinear 
sirups,  even  if  no  EDB  predicate  appears  more  than  4  times  in  the  body  of  the  recursive 
rule. 

In  Chapter  4,  we  show  that  the  detection  of  basis-linearizability  in  head-rectified,  single- 
IDB  Datalog  programs  in  undecidable  even  if  the  program  has  one  bilinear  rule,  an  un¬ 
bounded  number  of  linear  rules  and  only  5  basis  rules.  Our  treatment  also  shows  that 
program  containment  is  undecidable  for  a  restricted  class  of  linear  Datalog  programs. 

Finally,  in  Section  1.5,  we  provide  polynomial-space  and  polynomial- time  algorithms 
that  are  sufficient  (but  not  necessary)  for  the  detection  of  basis-linearizability  in  arbitrary 
single-IDB  Datalog  programs. 

1.4.4  Sequencability:  definition  and  results 

Define  a  top-down  expansion  generated  by  V  to  be  sequenced  if  a  recursive  atom  generated 
by  the  (top-down)  application  of  a  recursive  rule  is  never  expanded  through  the  rule  Vj.  if 
i  >  j  (see  Figure  1.12).  The  program  V  is  said  to  be  sequencable  iff  sequencing  is  a  normal 
form  for  the  proof  trees  generated  by  the  program;  that  is,  iff  every  closed  conjunctive 
query  generated  by  V  is  contained  in  some  closed,  sequenced  conjunctive  query.  If  V  is 
sequencable,  then  it  may  be  replaced  by  the  following  program  Q. 

Let  qi  ^ .  .qn  be  new  and  distinct  predicate  symbols,  and  construct  the  program  Q  from 
program  V,  as  follows.  First,  replace  the  rule  ?  !  by  the  two  rules  r\  and  S\.  where  r[  is  the 
same  as  7*1: 

r',  :  piXio)  p{Xn),---p{XnM-i- 

■Si  :  p{Vi , . . . ,  T4)  fyi ( Vi , . . . ,  ’14). 

The  V^s  a.re  distinct  variables.  Next,  replace  each  recursive  rule  r,  (  /  >  1)  l)y  the  two 

r-  :  9i-i(^(t-i)o)  - qi^i{Xii^^).C,. 

s,:  q,^x{Vu...,V,):^  g,(  ,  V,). 

As  before,  the  Vs  are  distinct  variables.  Finally,  replace  each  nonrecursive  rule  by  tlx* 
rule 

'  QaiVio)  V,. 

Q  computes  those  facts  which  would  be  produci'd  i)y  a  bottom-up  evaluation  of  V,  in  wlfn  h 
we  initialise  p  using  the  6,,  and  then,  in  se(iin'M(x.  close  under  . '  i  V  is  ea.sils 
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Figure  1.12:  A  sequenced  expansion. 
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Figure  1.13:  Illustrating  E.xample  1.18. 

seen  that  the  normal-form  closed  conjunctive  queries  of  V  are  isomorphic  to  the  closed 
conjunctive  queries  of  Q.  Therefore,  if  V  is  sequencable,  then  V  and  Q  are  equivalent. 

Example  1.18  For  the  program  of  Example  1.4,  the  program  Q  is  the  program 


Figure  1.13  sketches  the  idea  behind  the  isomorphism  between  the  closed  sequenced  expan¬ 
sions  of  V  and  the  closed  expansions  of  Q.  □ 

The  gains  of  replacing  F*  by  Q  are  obtained  in  several  ways.  For  example,  the  evaluation 
of  the  transformed  program  Q  can  be  pipelined.  Most  importantly,  however,  tlie  detection 
of  sequencability  can  often  set  up  further  optimizations,  as  in  Example  1.8.  Sequencability 
is  also  essential  to  the  detection  of  separability  ([24])  in  linear  programs. 


Results 

Sequencability  is  not  known  to  be  decidable  for  any  interesting  classes  of  ])rograms.  In 
Chapter  4,  we  will  show  that  the  detection  of  secinencability  is  undecidable  for  liead-rectilied. 
single-IDB  Datalog  programs  with  only  two  recursive  rules  and  9  basis  rules.  Our  treatment 
provides  a  tight  luidecidability  result  for  program  equivalence. 
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Sequencability  has  only  been  studied  for  linear  Datalog  programs  with  two  recursive 
rules  and  a  single  basis  rule,  and  the  decidability  of  sequencability  is  open  even  on  this 
restricted  domain.  A  variety  of  sufficient  conditions  have  been  proposed  ([25],  [17])  in  this 
case.  In  Chapter  2,  we  show  that  sequencability  is  vV'T’-hard  for  such  programs  if  EDB 
predicates  are  allowed  to  appear  up  to  3  times  in  the  body  of  each  recursive  rule,  and  in 
fact  that  all  the  proposed  sufficient  conditions  are  also  yV'P-hard  in  this  case.  We  also  show 
that  a  popular  sufficient  condition  is  polynomial  if  no  EDB  predicates  aie  allowed  to  lepeat 
in  any  recursive  rule. 

Finally,  we  provide  in  Section  1.5  a  polynomial-space  algorithm  that  is  sufficient  (but 
not  necessary)  for  the  detection  of  sequencability  in  arbitrarv  Datalog  piogiams. 

1.4.5  Subtree  eliminations  as  a  descriptive  mechanism 

The  normal  forms  implied  by  these  problems  are  susceptible  to  a  unifoim  desciiptiou,  as 
the  elimination  of  subtrees  of  a  certain  shape  from  the  proof  trees  generated  by  a  program. 
.4s  a  dual  to  the  concept  of  a  closed  top-down  expansion,  let  us  define  a  top-down  expansion 
to  be  open  iff  the  only  rules  used  to  construct  the  expansion  are  recursive  rules. 

Definition  1.3  Let  V  be  the  canonical  single-IDB  program  of  the  preceding  subsection. 

A  subtree  elimination  instance  is  the  program  "P  and  a  finite  set  S  of  finite  shapes  over 
the  recursive  rules  rj . .  (that  is,  the  shapes  define  open  expansions).  Let  Q  be  the  set 
of  closed  top-down  expansions  generated  by  P,  such  that  no  subtree  of  the  expansion  is 
described  by  a  shape  in  S.  Then  the  answer  to  the  instance  is  “yes"  iff  7^  =  Q  (recall  that 
we  speak  of  any  union  of  closed  top-down  expansions  as  a  program).  □ 

The  set  of  such  instances  is  called  the  subtree-elimination  problem  (SEP).  Since  Q 
is  a  subset  of  the  conjunctive  queries  generated  by  P,  and  by  the  theorem  of  Sagiv  and 
Yaiuiakakis  (Theorem  1.2),  we  may  conclude  that  the  answer  to  an  SEP  instance  is  “yes" 
iff  each  conjunctive  query  generated  by  P  is  contained  in  some  conjunctive  query  generated 

by  Q. 

The  transformations  of  the  previous  sections  may  all  be  described  within  the  framework 
of  SEP.  The  idea  is  that  the  violations  of  the  required  normal  form  are  represented  as 
subtree  shapes. 

One-boundedness 

Let  P  be  the  canonical  single-IDB  program  of  this  section.  Let  a  minimum-depth  violation 
of  one -boundedness  (or  just  a  violation)  be  an  open  top-down  expansion  of  depth  2.  and  let 

5  =  . s„}  be  the  set  of  shapes  of  the  minimum-depth  violations.  It  is  easily  seen 

that  the  set  of  one-bounded  top-down  expansions  generated  by  P  is  precisely  the  set  of 
expansions  in  which  no  subtree  has  the  shape  s,'  for  any  i.  Lhus,  P  is  one-bounded  iff  the 
answer  to  the  SEP  instance  <  7^,5  >  is  “yes’. 

Example  1.19  For  the  program  of  Example  1.2.  the  set  S  is  the  singleton  {['■i[ri[]]]}, 
representing  the  single  minimum-depth  violation  of  Figure  1.14.  □ 
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Figure  1.14:  Alininuuu-depth  violation  of  one-boundedness. 


piX.Y) 


Figure  1.1.5:  Minimum-deptli  violation  of  basis-linearizability. 
Linearizability  by  basis 

Let  a  minimum-depth  violation  of  right-linearity  (or  just  a  violation)  be  an  open  top-down 
expansion  of  depth  2,  in  which  the  rightmost  recursive  subgoal  that  is  a  child  of  the  root 
is  not  expanded,  and  in  which  at  least  one  otlier  child  of  the  root  is  expanded.  Let  S  = 
{.si, . . . ,  6„}  be  the  set  of  shapes  of  the  minimum-depth  violations.  It  is  easily  seen  that  l  ho 
set  of  right-linear  top-down  expansions  generated  by  V  is  precisely  the  set  of  expansions  in 
which  no  subtree  has  the  shape  s,  for  any  i.  Thus,  V  is  linearizable  by  basis  iff  the  answer 
to  the  SEP  instance  <1^,3  >  is  “yes”. 

Example  1.20  For  the  program  of  Example  1.3.  the  set  S  is  the  singleton  {[/  if'  iOOjfl]}. 
representing  the  single  minimum-depth  violation  of  Figure  1.15.  □ 

Sequencability 

Let  a  minimum-depth  violation  of  sequencability  [ov  ]\\si  a  violation)  be  an  open  top-down 
expansion  of  depth  2,  in  which  if  r,  is  used  to  ox|)and  the  root,  then  at  least  one  child  of 
the  root  is  expanded  through  some  rule  r^  Midi  that  j  <  i.  Let  5  =  {.Si, . . . , .s„}  be  the 
set  of  shapes  of  the  minimum-depth  violations.  It  is  easily  seen  that  the  set  of  serineticed 
top-down  expansions  generated  by  V  is  pr('(•i■^e|y  ilm  set  of  expansions  in  which  no  subtrof’ 
has  the  shape  s;  for  any  i.  Thus,  V  is  se(|n.'n(  able  iff  the  answer  to  the  SEP  instattce 
<  'P.S>  is  “yes”. 
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Figure  1.16:  Minimum-depth  violation  of  sequencability. 

Example  1.21  For  the  program  of  Example  1.4,  the  set  S  is  the  singleton 
The  corresponding  minimum-depth  violation  is  depicted  in  Figure  1.16.  □ 


1.5  Subtree-elimination  algorithms 

We  now  present  a  uniform  framework  for  the  generation  of  sufficient  conditions  for  the 
solution  of  an  SEP  instance.  In  each  case,  the  program  that  we  consider  is  the  generic 
single-IDB  ])rogram  V  of  Section  1.4.1. 

1.5.1  Basis-independence 

Recall  that  an  expansion  is  termed  open  if  only  recursive  rules  are  used  in  the  construction 
of  the  expansion,  and  closed  if  all  the  leaves  are  EDB  predicates.  Our  sufficient  conditions 
will  focus  on  the  rectification  of  open  top-down  expansions;  that  is,  we  will  show  that 
every  open  expansion  generated  by  P  is  contained  in  a  normal-form  open  e.xpansion.  T.  his 
procedure  is  sufficient  to  prove  that  every  closed  top-down  expansion  is  contained  in  a  closed, 
normal-form  expansion,  as  we  show  below;  the  only  wrinkle  is  introduced  if  the  containing 
expansion  has  depth  0.  Recall  that  the  top-down  expansion  of  depth  0  (the  e.xpansion  with 
shape  [])  is  the  toi)-down  expansion  generating  the  conjunctive  query 

p(.Yi,...,A'fc)  p(A',....,AT-). 

where  the  A's  are  distinct  variables.  Now,  assume  that  an  expansion  Ti  is  contained  in  []. 
and  that  /  is  the  containment  mapping  proving  the  containment.  By  the  properties  of  the 
containment  mapping,  Ti  must  have  some  leaf  that  is  syntactically  identical  to  its  root. 

Theorem  1.5  .-Vssume  that  Ti  is  an  open  top-down  expansion  that  is  contained  in  the 
expansion  of  depth  0.  Then  every  closed  top-down  expansion  obtained  by  applying  basis 
rules  to  the  leaves  of  T\  is  contained  in  an  initialisation  rule. 

Proof.  Since  T\  is  contained  in  the  depth-0  expansion,  there  must  be  .some  leaf  (labelled  /, 
say)  that  is  syntactically  identical  to  the  root.  Henc'.  il  the  basis  rule  6,  is  used  to  expand 
no(le(l)  in  T].  then  the  result  is  contained  in  6,  by  tlic  Expansion  Theorem.  □ 


1.5.  SUBTREE^ELIMINATION  ALGORITHMS 


31 


By  our  assumption  that  all  violations  in  our  SEP  instance  are  open,  we  conclude  that 
each  basis  rule  (considered  as  a  top-down  expansion)  is  in  the  indicated  normal  form;  that 
is,  no  such  expansions  are  prohibited.  Now,  every  closed  top-down  expansion  is  either  a 
basis  rule,  or  is  obtained  by  expanding  every  />leaf  in  an  open  expansion  through  some 
basis  rule.  Hence,  by  the  Expansion  Theorem  and  Theorem  1.5,  the  rectification  of  all  open 
expansions  is  sufficient  to  prove  that  the  answer  to  a  given  SEP  instance  is  “yes”;  that 
is,  that  every  closed  top-down  expansion  is  contained  in  a  closed,  normal-form  expansion. 
Note  that  this  result  holds  independently  of  the  set  of  initialisation  rules  in  V]  that  is,  the 
transformations  that  we  will  consider  are  basis-independent. 

The  following  theorem  shows  that  if  an  expansion  T  is  contained  in  the  empty  expansion, 
then  the  result  of  expanding  any  leaf  of  T  through  the  expansion  U  is  contained  in  either 
the  empty  expansion  or  in  U, 

Theorem  1.6  Assume  that  Ti  is  an  open  top-down  expansion  that  is  contained  in  the 
expansion  of  depth  0  (the  expansion  with  shape  []).  Let  T3  be  the  expansion  obtained  by 
expanding  node(l)  in  Ti  through  the  expansion  T2.  Then,  either  T3  C  [],  or  there  is  a 
containment  mapping  from  T21  (a  replica  of  T2)  into  T3  such  that  for  any  leaf  node{[l\j]) 
in  T21.  the  destination  of  node{[l\j])  is  node{j)  in  T3. 

Proof,  Let  /  be  a  containment  mapping  from  []  into  Ti.  As  before,  there  is  some  leaf  (say, 
node{k))  in  Ti  that  is  syntactically  identical  to  the  root  of  T]..  There  are  two  cases.  If 
I:  ^  L  then  /  is  a  containment  mapping  from  []  into  T3.  If  k  =  /,  then  the  subtree  rooted 
at  node[l)  in  Ti  is  t{T2),  where  r  is  the  mgu  of  the  root  of  T2  with  node{l)  in  Ti;  hence, 
r(/'eni”M  is  a  containment  mapping  from  T2\  into  T3,  where  reni  is  the  rename  function 
that  creates  'r2i  from  T2.  □ 

1.5.2  Generating  sufficient  conditions 

Ramakrishnan,  Sagiv,  UUman  and  Vardi  ([25])  have  proposed  a  framework  for  the  construc¬ 
tion  of  conditions  that  are  sufficient  (but  not  necessary)  to  prove  that  the  proof  trees  of  a 
program  satisfy  a  normal  form.  Let  <  V,S  >  be  an  SEP  instance,  where  S  =  {  i;i, . . . ,  ua} 
is  a  complete  set  of  violations  to  the  desired  normal  form.  The  process  of  [25]  consists  of 
two  steps: 

1,  For  each  Vi,  show  that  there  is  a  containment  mapping  of  a  restricted  form  from  some 
normal-form  top-down  expansion  cji  into  Vi, 

2.  Show  that  the  results  of  (1)  may  be  used  to  inductively  rectify  all  top-down  expansions 
ofV, 

Such  a  technique  is  called  a  proof-tree  transformation  techniqne. 

One-boundedness 

By  Theorem  1.5  and  the  Expansion  Theorem,  the  containment  of  every  open  expansion 
of  depth  at  least  two  in  an  open  expansion  of  depth  at  most  one  suffices  to  prove  one- 
boundedness  in  V. 


Figure  1.17:  Illustrating  Theorem  1.7. 


Theorem  1,7  Assume  that  every  minimum-depth  violation  of  one-boundedness  is  contained 
in  an  open  expansion  of  depth  at  most  1.  Then  V  is  one- bounded. 

Proof.  By  induction  on  the  number  i  of  rule  applications  in  the  top-down  expansion,  we 
will  show  that  every  expansion  of  depth  at  least  2  is  contained  in  an  expansion  of  depth  at 
most  1.  The  basis,  i  —  2,  follows  by  assumption.  For  the  induction,  assume  the  truth  of 
the  hypothesis  for  2  <  I  <  i. 

Consider  a  top-down  expansion  T\  obtained  through  i  rule  applications.  Ti  is  obtained 
by  expanding  some  leaf  node(n)  in  some  expansion  T2  (constructed  using  i  —  1  rule  applica¬ 
tions)  through  some  rule  vj  (see  Figure  1.17).  By  our  inductive  hypothesis,  T2  is  contained 
in  a  tree  T3  of  depth  at  most  one;  assume  that  /  is  a  containment  mapping  proving  the 
containment.  If  T3  has  depth  0,  then  by  Theorem  1.6,  C  []  or  T3  C  Vj,  and  our  result 
follows. 

Assume  that  T3  has  depth  1.  By  the  Expansion  Theorem,  we  may  expand  each  node 
in  T3  whose  destination  under  /  is  nod€(n)  through  rj,  to  produce  an  expansion  T4  of 
depth  2  such  that  Ti  C  T4  (see  Figure  1.17).  However,  by  assumption,  T4  is  contained  in  a 
top-down  expansion  of  depth  at  most  1,  and  our  result  follows  because  the  composition  of 
containment  mappings  is  a  containment  mapping.  □ 


Example  1.22  Consider  the  program  of  Example  1.2.  The  minimum-depth  violation  of 
one-boundedness  is  contained  in  the  top-down  expansion  of  depth  1.  under  the  identity 
mapping  f{X)  =  X,f{Y)  =  Y  (  see  Figure  1.18).  Hence,  the  program  is  one-bounded,  □ 

A  similar  procedure  on  closed  top-down  expansions  may  be  used  to  show  that  one- 
boundedness  is  decidable.  That  is,  V  is  one-bounded  iff  every  closed  expansion  of  depth  .3 
is  contained  in  a  closed  expansion  of  depth  at  most  2.  Our  treatment  has,  however,  been 
motivated  by  the  interests  of  exposition,  and  by  the  setting  up  of  an  efficient  algorithm  for 
the  detection  of  one-boundedness:  the  latter  is  contaiiK'd  in  the  next  section. 
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Figure  1.18:  Proving  one-boundedness. 
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Figure  1.19:  An  acceptable  mapping. 


Linearizability  by  basis 

By  definition,  the  open  expansion  of  depth  0  is  right-linear.  By  Theorem  1.5  and  the 
Expansion  Theorem,  if  we  show  that  every  non-right-linear  open  expansion  is  contained 
in  a  right-linear  open  expansion,  then  we  may  conclude  that  V  is  basis-linearizable.  The 
following  treatment  is  an  extension  of  a  result  of  [25];  they  consider  a  Datalog  program  with 
a  single  recursive  rule,  which  is  bilinear. 

Let  Ti  and  T2  be  open  expansions,  let  nod€(l)  be  the  rightmost  p-leaf  in  Ti  and  let 
nod€{k)  be  the  rightmost  p-leaf  in  T2.  We  say  that  a  containment  mapping  /  :  T2  —  2'i  is 
acceptable  for-  7'iglit-linearit.y  (or  just  acceptable)  iff  no(le{l)  is  the  destination  (under  /)  of 
no  leaf,  or  is  the  destination  of  node{k)  only. 

Example  1.23  Consider  the  program  of  Example  1.3.  Figure  1.19  shows  that  the  ininiiiutin- 
depth  violation  of  right-linearity  is  contained  in  a  right-linear  expansion  under  the  accept¬ 
able  mapping  f{X)  =  XJ(Y)  =  Y,f(A)  =  V.f(B)  =  U.  □ 


Theorem  1.8  .Assume  that  every  minimum-depth  violation  of  right-Unearity  is  contained  in 
a  right-linear  expansion,  and  that  the  containment  is  provable  by  an  acceptable  containment 
mapping.  Then  V  is  basis-linearizable. 

Proof.  By  induction  on  the  number  i  of  rule  applications  in  the  top-down  expansion,  we 
show  that  every  non-right-linear  expansion  is  contained  in  a  right-linear  expansion.  I'he 
basis,  i  =  0  (representing  the  top-down  expansion  of  depth  0),  is  trivial.  For  the  induction, 
assume  the  truth  of  the  hypothesis  for  0  <  /  <  /.  where  t  >  0. 
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Figure  i/20:  Ti  in  Theorem  1.8. 


Consider  a  top-down  expansion  Tj  obtained  through  i  rule  applications.  If  Ti  is  right- 
linear,  then  the  result  follows;  if  Ti  is  a  minimum-depth  violation,  then  the  result  follows  by 
assumption.  Otherwise,  Ti  is  obtained  by  expanding  some  leaf  node{n)  in  some  expansion 
T2  (constructed  using  i  -  1  rule  applications)  through  some  rule  Vj  (see  Figure  1.20). 

By  our  inductive  hypothesis,  T2  is  contained  in  a  right-linear  tree  T3.  If  T3  has  depth  0. 
then  our  result  follows  by  Theorem  1.6. 

Assume  that  T3  has  depth  at  least  1.  By  the  Expansion  Theorem,  wc  may  expand  each 
node  in  C3  whose  destination  under  /  is  nocle(n}  through  r^,  to  produce  an  expansion  Tj 
such  that  T\  C  T4.  If  T4  is  right-linear,  tlie  result  follows.  Otherwise,  7'i  is  a  tree  that 
is  ol)ta.ined  from  some  right-linear  tree  by  expanding  some  uoii-rigiO  most  p-leaves  in  the 
right-linear  tree  through  the  rule  Vj,  We  call  such  an  expansion  almo.sl  rujht-lincar. 

We  may  rectify  T4  in  a  bottom-up  manner,  using  the  Splicijig  Theorem  and  the  fact 
that  the  composition  of  containment  mappings  is  a  (‘ontainment  mapping,  as  follows.  At 
any  stage  in  the  process,  we  have  a  situation  as  .shown  in  Figure  1.21;  that  is.  ue  have  an 
almost-right-linear  tree  T^  whose  rightmost  p-no(h'  is  rwpanded  through  a  violation  W  such 
that  the  right most  p-node  of  V  is  a  leaf  or  is  exi)aiMhMl  ihiough  some  right-linear  tree  Tij,, 

By  assumption,  there  is  an  acceptable  mapping  li^un  '>um('  right-linear  expansion  R  into 
r.  Hence,  by  the  Splicing  Theorem,  we  may  sphn.'  /»'  in  lor  I*:  the  p-h'aves  of  r(Mnain 
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Figure  1.21:  Tlie  rectification  |)roco(lure  of  Theorem  l.S 
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leaves,  except  for  the  rightmost  y>leaf  in  R  which  may  be  expanded  through  the  right-linear 
tree  Te-  An  inductive  repetition  suffices  to  complete  the  rectification.  □ 

By  the  observations  of  Example  1.23,  we  may  now  claim  that  the  transitive-closure 
program  of  Example  1.3  is  basis-linearizable. 

Complexity  If  V  is  a  Datalog  program,  then  the  minimum-depth  violations  may  be 
constructed  in  polynomial  time,  and  the  containments  under  acceptable  mapi)ings  may  be 
tested  in  polynomial  space  by  the  chase  algorithm  of  [25].  In  the  non-Datalog  case,  tlie  (  liase 
may  not  terminate;  however,  we  may  generate  sufficient  conditions  by  placing  a  bound  on 
the  depth  of  the  containing  right-linear  expansion.  A  suitable  heuristic  may  be  one  based 
on  size  preservation;  that  is.  we  may  insist  that  the  containing  query  be  no  bigger  than  ihe 
contained  expansion. 

Sequencability 

By  convention,  the  open  expansion  of  depth  0  is  sequenced.  By  Theorem  1.5  and  the 
Expansion  Theorem,  if  we  show  that  every  non-sequenced  open  expansion  is  contained  in 
a  sequenced  open  expansion,  then  we  may  conclude  that  V  is  sequencable.  The  following 
result  is  an  extension  of  an  algorithm  proposed  independently  by  [25]  and  [17]:  tliey  consider 
Datalog  programs  with  only  two  recursive  rules,  both  of  which  are  linear. 

For  any  /  <  j,  define  an  r/-7y  violation  of  sequencability  (or  just  an  /•,-/•/  violation)  to 
be  an  open  expansion  of  depth  2,  such  that  rj  is  used  to  expand  the  root,  one  or  more  of 
the  children  are  expanded  through  and  all  other  children  of  the  root  are  leaves. 

Let  us  further  define  an  Vi-r j-sequenced  expansion  to  be  a  sequenced  expansion  using 
only  the  rules  and  such  that  is  used  at  most  once  (that  is,  can  only  be  used  to 
expand  the  root). 

Example  1.24  Consider  the  program  of  Example  1.4.  The  minimum-de|)th  violation  of 
sequencability  in  this  program  is  the  expansion  [7'2[7*i [][]]]  (see  Figure  1.22);  this  expansion  is 
an  ri-r2  violation.  The  minimum-depth  violation  is  contained  in  the  rpr^-sequenced  expan¬ 
sion  [^'i[^'2[]][^’2[]]]:  3S  shown  in  the  figure;  the  containment  follows  because  botli  expansions 
have  the  same  root  and  leaves,  □ 

Let  us  define  a  node  violation  of  to  be  the  root  of  an  Vi-rj  violation  for  any  j. 

Theorem  1.9  Assume  that  for  all  i  and  j,  each  r^-Vj  violation  is  contained  in  some  /■,-/'/- 
sequenced  expansion.  Then  V  is  sequencable. 

Proof.  By  induction  on  the  number  m  of  rule  applications  in  a  top-down  expansion  T 
generated  by  V.  we  prove  that  T  is  contained  in  a  secjuenced  expansion.  The  i)roof  is 
similar  to  that  of  Theorem  l.S:  we  merely  sketch  the  inductive  step.  Assume  that  the  inth 
rule  applied  is  r^.  By  the  inductive  hypothesis  and  tlu'  Fxi)ansion  Theorem.  T  is  contained 
in  some  top-down  expansion  TL  which  is  obtained  by  (expanding  some  leaves  in  a  sequenced 
expansion  through  7’/.  VVe  inductively  rectify  usiny  i  lu’  S{)licing  'rheorcin.  by  '‘bubbling 
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Figure  1.22:  Proving  sequencability. 


up”  the  node  violations  of  r,  as  follows.  Consider  any  node  violation  at  maximum  depth; 
assume  it  is  the  root  of  the  Vi-rj  violation  \\  in  which  some  leaves  may  be  expanded  through 
sequenced  expansions  in  which  every  rule  used  is  of  the  form  for  k  >  j  (where  j  >  i).  By 
assumption,  V  is  contained  in  an  /\-r^-sequenced  expansion  5.  By  the  Splicing  Theorem, 
we  may  splice  S  in  for  V';  this  process  either  reduces  the  number  of  node  violations  of 
by  1  or  replaces  the  node  violation  by  a  node  violation  at  a  smaller  depth.  Further,  the 
splicing-in  of  S  for  V  does  not  create  any  /*/,-?•/- violations  for  k  ^  L  Hence,  we  may  proceed 
in  stages,  in  each  stage  removing  all  node  violations  of  at  maximum  depth,  until  ihe 
tree  T'  is  sequenced.  The  situation  is  depicted  in  Figure  1.23.  In  the  figure,  R\  . . .  lu-  are 
sequenced  subtrees  in  which  the  rules  /q, . .  ..r,  are  not  used.  □ 

By  the  observations  of  Example  1.24.  we  may  conclude  that  the  program  of  Example 
1.4  (computing  the  symmetric,  transitive  closure)  is  sequencable. 


Complexity  As  before,  the  tests  of  Theorem  1.9  may  be  performed  in  polynomial  space 
for  Catalog  programs  through  the  chase  algorithm  of  [25].  In  the  non- Catalog  case,  we  may 
construct  sufficient  conditions  by  placing  a  bound  on  the  depth  of  each  containing  /*/-/  ;- 
secpienced  expansion,  perhaps  by  using  the  size-maintaining  heuristic  that  was  presentf'd  in 
the  previous  subsection. 

1.5.3  Fast  algorithms 

Finally,  we  present  restricted  algorithms  for  the  detection  of  one-boundedness  and  l)asi>- 
liuearizability.  In  Chapter  2,  we  will  show  that  for  Catalog  programs,  these  algoriihm.s 
are  polynomial  in  the  size  of  V:  they  perform  polynomial-time  reductions  to  r('strict(‘d 
programs  for  which  the  detection  of  the  ap])ro|)riate  normal  form  is  known  to  be  decidahE 
in  polynomial  time.  The  idea  behind  the  algorithms  is  to  furtlier  restrict  tlte  destination- 
of  an  atom  under  a  mapping  that  proves  the  containment  of  a  violation  in  a  normal-lurni 


Figure  1.23:  The  rectification  process  of  Theorem  1.9. 


expaiisiom  That  is,  we  essentially  rename  different  occurrences  of  the  recursive  atoms  in 
each  rule. 

Let  us  assume  that  for  any  i  and  j.  the  jlh  sul)goal  in  the  rule  i  is  given  the  superscript 
ij,  and  that  this  superscript  is  carried  through  all  top-down  e.xpansions®.  That  is,  if  r,  is 
used  to  e.xpand  a  p-atom  and  the  jth  subgoal  has  principal  functor  ry.  then  the  jth  child  of 
the  expanded  atom  is  referred  to  as  a  q'^-atom  (or  just  </'•'  ).  The  root  of  the  e.xpansion.  and 
the  body  of  the  top-down  e.xpansion  of  depth  0.  are  merely  y>-atoms.  Figure  1.24  illustrates 
this  notation  on  the  non-right- linear  top-down  expansion  of  Figure  1.19,  representing  the 
minimum-depth  violation  of  basis-linearizability  in  the  program  of  Example  1.3. 


One- bounded  ness 

For  all  /,  define  a.  violation  of  degree  i  to  be  a  minimum-depth  violation  of  one-boundodness 
in  which  exactly  /  children  of  the  root  are  expanded.  .\’ow.  for  any  open  top-down  exi)ansions 
T\  and  2T  we  say  that  a  containment  mapping  /  :  l\  —  To  is  restricted  if  the  destination 
of  any  cy''-atom  in  T\  under  /  is  a.  cy'-'-atom  in  7V  .\'ote  that  the  number  and  size  of 
the  violations  of  degree  1  are  polynomial  in  the  siz<>  of  V.  and  that  each  violation  can  be 
constructed  in  polynomial  time. 


^'Note  that  tiiis  concept  is  distinct  from  the  idea  of  labels  in  .1  U)p-iU)\vn  expansion 
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Figure  1.24:  Notation. 


Figure  1/25:  IF'  in  Theorem  l.iO. 

Theorem  1.10  Assume  that  every  violation  of  degree  1  is  contained  in  a  top-down  exi)an- 
sioii  of  deptli  no  more  than  1.  and  that  the  containments  are  proved  by  restricted  mappings. 
Then.  P  is  one-bounded. 

Proof.  We  prove  by  induction  on  i  that  every  violation  of  degree  i  is  contained  in  a  tree  of 
depth  at  most  one,  and  that  the  containment  is  proved  by  a  restricted  ma[)[)ing:  our  result 
tlien  follows  by  Theorem  1.7.  The  basis.  /  =  1.  follows  l\v  assumption. 

Now,  assume  the  truth  of  the  inductive  liy|)otlmsis  for  1  <  /  <  i.  Consider  any  violation 
T  of  degree  This  violation  must  be  obtained  by  e.xpanding  some  //'^’-leaf  at  depth  1  in 
some  violation  V  of  degree  i  —  1  through  sotue  rule  r^.  By  our  inductive  hypothesis. 
contained  in  some  expansion  IF  of  depth  at  most  1  under  a  restricted  containment  niaj)piii,u. 

If  the  depth  of  IF  is  0,  then  by  Lemma  l.li  and  the  Expansion  Lemma,  T  is  contained 
in  []  or  rj  under  a  restricted  mapping. 

If  the  depth  of  IF  is  1.  then  let  \V^  bo  tiu'  insult  of  expanding  the  //^'--leaf  in  IF  throu‘;li 
the  rule  rj  (see  Figure  1.25).  By  the  F.\p.ui‘^ion  Lemma,  T  is  contained  in  IF'  umb'i 
a  restricted  mapping.  However,  IF'  is  a  \i«)l.itii)n  of  degree  I,  and  is  (by  assumption* 
contained  in  a  treeofde|)th  at  most  I  uiuhu  .i  :.-triri<vl  mapping:  our  result  follows  luM  aim** 
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the  composition  of  restricted  mappings  is  a  restricted  mapping.  □ 

This  theorem  leads  to  the  following  algorithm  that  is  sufficient  (but  not  necessary)  to 
prove  one- boundedness. 

Algorithm  1.1 

INPUT:  A  single-IDB  program  V, 

OUTPUT:  “yes”  only  if  P  is  one-bounded. 

(1)  Construct  all  violations  of  degree  1,  and  all  expansions  of  depth  at  most  i. 

(2)  If  each  violation  of  degree  1  is  contained  in  an  expansion  of  depth  at  most  1  under  a 

restricted  mapping,  then  answer  “yes’*;  otherwise,  answer  “no”. 


□ 

There  are  a  polynomial  number  of  degree- 1  violations,  so  Step  ( 1)  may  be  accomplished 
in  polynomial  time.  Step  (2)  involves  a  polynomial  number  of  containment  tests,  each 
involving  the  existence  of  a  restricted  mapping  from  an  expansion  of  depth  0  or  1  into 
a  violation.  Testing  for  the  existence  of  a  mapping  from  a  depth-0  expansion  into  any 
expansion  is  clearly  polynomial.  Consider  tests  of  the  second  kind;  that  is,  tests  for  the 
existence  of  a  restricted  mapping  from  a  deptli-1  expansion  T  into  a  violation  V .  Each  atom 
in  T  has  at  most  2  possible  destinations  in  U  under  a  restricted  mapping,  and  we  will  show 
in  Chapter  2  that  such  tests  may  be  accomplished  in  polynomial  time.  Hence.  Algoritlun 
1.1  is  in  V. 

Basis-iinearizability 

A  similar  process  may  be  applied  to  create  sufficient  conditions  for  the  detection  of  l)asis- 
liiiearizability. 

As  before,  we  define  a  violation  of  degree  i  to  be  a  minimum-depth  violation  of  right- 
linearity  in  which  exactly  /  children  of  the  root  are  expanded.  Now,  for  any  violation  T\ 
and  right-linear  expansion  T2,  we  say  that  a  containment  mapping  /  :  Ty  —  T2  is  restricted 
if  for  any  c/^-leaf  in  T-y  that  is  not  the  rightmost  />-lea.f  in  Ti,  the  destination  of  this  (f  '-ntom 
in  T\  under  /  is  a  (/**^-atom  in  T2.  Note  that  a  restricted  containment  mapping  is  accei)table 
for  right-linearity.  Again,  we  observe  that  the  number  and  size  of  the  violations  of  degree  1 
are  polynomial  in  the  size  of  P,  and  that  each  violation  can  be  constructed  in  polynomial 
time. 

Theorem  1.11  Assume  that  every  violation  of  degrei'  1  is  contained  in  a  right-linear  top- 
down  ex|)ausion.  and  that  the  containments  are  proved  by  restricted  mappings.  Then,  P  is 
basis-linearizable. 

Prooj,  We  prove  by  induction  on  i  that  every  violaiioii  of  degree  /  is  (ontained  in  a  right- 
linear  expansion,  and  that  the  containment  is  pro\‘Ml  by  restricted  mapping;  our  result 
then  follows  by  Theorem  1.8.  The  basis.  /  =  1,  folb»\\'  l>\  as'^umpi ion. 
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Figure  1.2(3:  II’  in  Theorem  1.11. 

Now,  assume  the  truth  of  the  inductive  hypothesis  for  1  <  /  <  /.  Consider  any  violation 
T  of  degree  /.  This  violation  must  be  obtained  by  expanding  some  7;“^’deaf  at  depth  1  in 
some  violation  V  of  degree  i  —  L  such  that  this  leaf  is  not  the  rightmost  child  of  the  root, 
through  some  rule  Vj  (see  Figure  1.26).  By  our  inductive  hypothesis,  V  is  contained  in  soiii^' 
right-linear  expansion  M  under  a  restricted  containment  mapping. 

If  the  depth  of  IF  is  0,  then  by  Theorem  1.6,  T  is  contained  in  vj  or  []  under  a  restrirtcMl 
mapping. 

If  the  depth  of  W  is  1  or  larger,  then  let  II  '  be  the  result  of  expanding  every  //'^’-leaf  in 
II  whose  destination  is  the  newly-expanded  leaf  in  T  through  the  rule  Vj.  Bv  the  Fxpansiun 
Theorem,  T  is  contained  in  II'  under  a  restricted  mapping.  Note  that  II'  is  obtained  In 
expanding  some  /;^^-leaves  in  a  right-linear  exj)ansion  through  /•  -  that  is,  at  any  stage  of  1 1n* 
tree,  at  most  two  sibling  7>atoms  are  e.xpanded.  By  assumption,  each  violation  of  (legr‘’<‘ 
1  is  contained  in  a  right-linear  expansion  under  a  restricted  mapping,  and  a  bottom- uj) 
lectification  as  in  the  proof  of  T  heorem  1.8  serves  to  complete  the  proof.  D 

This  theorem  yields  the  following  suflicdeut  (  hut  not  necessary)  condition  for  the  df'(<M 
tion  of  basis-linearizability.  VVe  will  show  in  (  haptcu*  2  that  the  algorithm  is  polynomial: 
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the  key  is  to  test  whether  every  degree-1  violation  is  contained  in  a  right-linear  tree  of  (lepth 
at  most  2  under  a.  restricted  mapping' . 

Algorithm  1.2 

INPUT:  A  single-lDB  program  P. 

OUTPUT:  "yes”  only  if  V  is  basis-linearizable. 

(1)  Construct  all  violations  of  degree  1,  and  all  right-linear  expansions  of  depth  at  most  2. 

(2)  If  each  violation  of  degree  1  is  contained  in  a  right-linear  expansion  of  depth  at  most  2 

under  a  restricted  mapping,  then  answer  "yes”;  otherwise,  answer  “no”. 

□ 

There  are  a  polynomial  number  of  degree- 1  violations  and  a  polynomial  number  of  right- 
linear  trees  of  depth  at  most  2,  so  Step  (1)  may  be  accomplished  in  polynomial  time.  Step 
(2)  involves  a  polynomial  number  of  containment  tests,  each  involving  the  existence  of  a 
restricted  mapping  from  a  right-linear  e.xpansion  of  deptli  0,  1  or  2  into  a  violation.  .Vs  we 
mentioned  in  our  treatment  of  one-boundedness  in  the  previous  subsection,  testing  for  the 
existence  of  a  mapping  from  a  deptli-0  expansion  into  any  expansion  is  polynomial.  (  on- 
sider  tests  for  the  existence  of  a  restricted  mapping  from  a  depth- i  or  dei)th-2  right-linear 
expansion  T  into  a  violation  V’.  Let  N  be  the  size  of  V  :  remember  that  .V  is  polynomial 
in  the  size  of  the  program  P.  By  the  definition  of  a  restricted  mapping,  one  atom  (say.  o) 
in  T  has  up  to  N  possible  destinations  in  V.  and  each  other  atom  has  at  most  2  possible 
destinations.  Hence,  by  a  case  analysis  on  the  destinations  oi  «.  we  may  reduce  t  he  test 
to  N  tests  for  the  existence  of  a  mapping  from  T  into  V  such  that  each  atom  in  T  has 
at  most  2  allowed  destinations  in  V.  We  will  show  in  Chapter  2  that  such  tests  may  be 
accomplished  in  ])olynomial  time.  Hence,  .'Vlgorithm  1.2  is  in  V. 

Sequencability 

A  similar  algorithm  may  be  devised  for  the  detection  of  sequencability,  but  we  omit  the 
treatment  in  the  interest  of  brevity. 

1.6  Overview  of  Chapters  2,  3  and  4 

In  the  remainder  of  this  report,  we  will  investigate  the  complexity  of  the  subtree  elimi¬ 
nation  problem,  focussing  on  the  detection  of  one-boundedness,  basis-linearizability  and 
sequencability. 

In  Chapter  2.  we  investigate  the  complexity  of  detecting  containments  among  conjunc¬ 
tive  queries.  We  extend  the  conjunctive  query  containment  problem  to  the  l:-roi)t<iinm( nt 
problem,  and  show  that  A-containment  and  kS AT  are  essentially  the  same  problem.  This  in¬ 
vestigation  results  in  a  complete  description  of  the  complexity  of  conjunctive  query  contain¬ 
ment.  These  results  are  then  extended  to  show  that  one-boundedness,  basis-linearizability 


'Tlie  algoritliin  is  polynoniial  for  any  given  choice  of  depth  hir  the  right-linear  trees. 
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Polynomial  time 

.VT-hard 

U  ndecidable 

One- 

boundedness 

linear  sirup. 

<  1  reps 

linear  sirup. 

<  4  reps 

never 

Basis- 

linearizability 

bilinear  sirup, 

<  1  reps 

bilinear  sirup, 

<  4  reps 

1  nonlinear  rule. 

5  basis  rules. 

Sequencability 

??? 

2  recursive  rules 
(both  linear), 

<  3  reps, 

1  basis  rule 

2  recursive  rules, 

9  basis  rules. 

and  seqiiencability  are  A/"7^-hard  for  restricted  classes  of  programs.  In  this  chapter,  we  also 
prove  that  the  sufficient  conditions  of  Section  1.5.3  (Algorithms  1.1  and  1,2)  are  in  V, 

In  Chapter  3,  we  provide  a  decision  procedure  for  the  detection  of  basis-linearizability 
in  a  class  of  recursive  programs.  The  decision  procedure  can  be  seen  to  be  polynomial  using 
the  techniques  of  Chapter  2. 

Finally,  in  Chapter  4,  we  show  that  sequeacability  and  basis-linearizability  are  undecid- 
able  for  multi-rule,  nonlinear  programs.  Tlic  technkiiies  of  this  chapter  also  provide  tight 
undecidability  results  for  the  detection  of  program  equivalence. 

1.6.1  Complexity  results 

The  results  of  our  investigation  are  presented  in  Table  l.l.  I'he  programs  considered 
are  all  Datalog.  The  expression  <  /  reps  means  that  each  recursive  rule  in  the  program  has 
at  most  i  occurrences  of  an  EDB  predicate  in  the  body  of  each  recursive  rule. 
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Chapter  2 


The  complexity  of  conjunctive 
query  containment 

2.1  Introduction 


In  this  chapter,  we  will  characterize  the  complexity  of  testing  containments  among  i)airs 
of  conjunctive  queries.  Recall  that  a  conjunctive  query  is  a  single-rule,  nonrecursive  pro¬ 
gram.  and  that  the  theorem  of  Sagiv  and  Yannakakis  (Theorem  1.2)  relates  containments 
among  recursive  programs  to  containments  among  the  conjunctive  queries  generated  l>y 
these  programs. 

In  Section  2.2,  we  characterize  the  complexity  of  the  conjunctive  query  containment 
problem.  Our  results  are  obtained  by  defining  and  analysing  a  closely  related  problem, 
which  we  term  the  k -containment  problem.  VVe  also  show  that  a  restricted  version  of  the 
2-containment  problem  is  in  .VC,  and  is  hence  efficiently  computable  in  parallel. 

In  Section  2.3,  we  extend  the  results  of  Section  2.2  to  provide  vV'P-hardness  results  for 
the  one- boundedness,  sequencability  and  basis-linearizability  problems.  We  also  justify  the 
title  of  Section  1.5.3  by  showing  that  the  algorithms  of  that  section  are  in  V.  as  we  claimed. 


2.2  The  /j-containment  problem 


Let  us  consider  the  complexity  of  the  conjunctive  (jiiery  containment  problem.  In  this 

.section,  we  will  define  tlie  /.'-containment  problem  ;»>  a  parametrized  version  of  the  conjunc-  i. 

tive  query  containment  problem,  and  show  that  whil'’  the  2-containment  problem  is  in  P. 

the  3-containment  is  ,\T-complete.  We  also  sliow  ili.ii  for  Datalog  (that  is.  function-free) 

conjunctive  queries,  the  2-containment  problem  i>.  in  N  I  ()( JSlWCE  (and  hence  in  .\'C). 


-l-l 
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2.2.1  Conjunctive  query  containment 

Recall  from  Section  1.3.1  that  a  conjunctive  query  is  a  single-rule,  imnreciirsive  prograiu. 
That  is,  a  conjunctive  query  is  of  the  form 

piX)  B. 

where  B  is  an  arbitrary  conjunction  of  atomic  formulae.  Strictly  speaking,  the  predicate  /; 
must  not  occur  within  B;  however,  we  will  ignore  this  restriction  when  it  is  clear  from  I  ho 
context  that  such  an  expression  is  to  be  applied  nonrecursively. 

Consider  the  (not  necessarily  function-free)  conjunctive  queries 

Ci:ao(Uo)  «i(Cfi),...«m(f4)- 

C2:bo{\Vo)  bi(WO,..MWt)- 

We  assume  without  loss  of  generality  that  there  are  no  repetitions  of  any  atomic  formula  in 
the  body  of  Cj.  or  in  the  bodyjjf  C2  (although  predicates  may  appear  repeatedly).  That 
is,  for  no  i  and  ;,  i  ;,  are  ai(Ui)  and  aj{Uj)  syntactically  identical  (or  bt(\V,)  and  6/(lP,) 
identical).  Subgoal  repetitions  can  clearly  be  identified  and  eliminated  in  polynomial  time. 
The  conjunctive  queries  Cj  and  C2  will  serve  as  generic  conjunctive  queries  in  ihe  remainder 
of  this  chapter. 

The  theorem  of  Chandra  and  Merlin 

The  following  treatment  is  a  synopsis  of  Section  1.3.2.  which  presents  a  syntactic  lest  for 
the  containment  of  a  conjunctive  query  in  another  (see  Theorem  1.1).  The  liasic  tool  msed 
is  the  containment  mapping,  as  described  below  in  terms  of  the  generic  conjunctive  queries 
C\  and  €'2  presented  at  the  beginning  of  this  .section. 

Let  /  be  a  function  on  the  symbols  in  C'2  that  leaves  constants  unclianged.  We  may 

extend  /  to  terms  (and  atoms)  in  the  obvious  way;  that  is,  we  define  /(<■/( d] . df.))  lo 

be  /(<7)(/(di ),..., /(4)),  where  the  c/,  are  arbitrary  terms.  Such  a  function  is  said  to  be  a 

containinenl.  mapping  from  C'2  into  Ci  if  the  following  are  both  true. 

1-  f(bo{Wo))  =  ao{Uo). 

2.  For  1  <  y  <  /  there  is  an  i,  I  <  i  <  m.  such  (hat  f(b,{\Vj))  =  u,(r,). 

If  there  is  a  containment  mapping  /  :  C,  -  C'l.  then  for  every  atom  /  in  C.,.  we  say  1  hai 

fit)  is  the  destination  oi  t  under  /.  Since  /  is  a  function  and  there  are  (liy  assumption)  m. 
subgoal  repetitions  in  C\.  each  atom  in  C?  has  a  unique  destination  under  /. 

The  theorem  of  Chandra  and  .Merlin  ('I'heorem  1.1)  .slates  that  for  any  conjunci  i\ 
queries  C'l  and  C'2.  C'l  C  C’2  if  and  only  if  there  is  a  containment  mapping  /  :  C'j  —  t  V 

Example  2.1  Consider  the  conjunctive  queries  f  ,  nnd  C'.i,  as  defined  below. 

C's  ■■  p(.\)  a{X,B).biA.  B).b{C\  B).b.{  P.  P).i  {  B.  B).c(C.  B),c{A.D). 

C'.,  :  p(X)  a{X.V).b(l\V).c(r.\V). 
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The  function  /  defined  by  f{X)  =  XJ{V)  =  =  A J{W)  =  D  is  a  containment 

mapping  from  CU  into  C3.  The  destination  (under  /)  of  p(  A')  is  p(X),oi  fl(A,  V)  is 
of  b(  U,  i')  is  b{A,  B)  and  of  c(f/,  W)  is  c(.4,  D). 

However,  there  is  no  containment  mapping  fj  :  C3  —  C4.  If  there  were  such  a  mapping 
</,  then  the  destination  of  c{B,B)  under  g  would  have  to  be  c{U,W)  (the  only  c-atom  m 
C'4);  however,  the  requirement  g{c{B,  B))  =  c{U,  would  imply  g{B)  =  U  and  g{B)  =  W, 
contradicting  the  assumed  functionality  of  5.  □ 

2.2.2  Conjunct  mappings 

The  theorem  of  Chandra  and  Merlin  is  based  on  the  existence  of  a  mapping  on  symbols. 
-An  alterirative  viewpoint,  and  one  which  lends  itself  to  a  description  of  the  complexity  of 
deciding  conjunctive  query  containment,  is  based  on  the  e.xistence  of  a  mapping  on  atoms. 

Definition  2.1  Let  s  and  t  be  two  arbitrary  atomic  formulae.  We  say  that  there  is  a  partial 
mapping  from  -s  to  t  (written  s  —  f)  if  there  is  a  substitution  for  the  variables  in  -s  under 
which  s  is  made  syntactically  identical  to  t.  □ 

Each  partial  mapping  s  — ^  t  (if  it  exists)  implies  a  unique  substitution  for  each  variable 
in  .s;  the  set  of  such  assignments  is  termed  the  assignment  se:t  induced  by  the  partial  mapping. 
Two  assignment  sets  are  said  to  be  consistent  if  no  variable  is  assigned  a  different  value 
by  the  two  assignment  sets,  and  a  pair  of  partial  mappings  is  sard  to  be  consistent  if  the 
assignment  sets  inchrced  by  the  mappings  are  consistent. 

Example  2.2  In  E.xample  2.1,  the  partial  mapping  b{U.V)  —  b{A,B)  induces  the  as¬ 
signment  .set  {[/  :=  .4,1'  :=  B).  Further,  the  partial  mappings  «(A',V')  —  a{X.B) 
and  b{U,V)  —  b{A,D)  are  inconsistent,  since  the  variable  V  is  assigned  a  different  value 
l)v  these  mappings.  Finally,  as  indicated  in  Example  2.1.  there  is  no  partial  mapping 
ciB.B)^  c(U,W).  □ 

The  e.xistence  of  a  partial  mappirrg  may  be  tested  and  the  induced  assignment  set  gener¬ 
ated  through  term- matching  (UUman  ([36])),  and  may  hence  be  accomplished  in  time  that 
is  polynomial  in  the  total  size  of  s  arrd  t.  Testing  for  consistency  is  clearly  polynomial  in  the 
size  of  the  assignment  sets.  The  following  lemma  shows  that  for  function-free  atoms,  both 
these  procedures  may  be  accomplished  in  L0GSP.4CE.  The  central  idea  is  the  fact  that  in 
the  function-free  case,  the  existence  of  a  partial  map|)ing  and  the  coirsistency  of  a  pair  ol 
partial  mappings  may  each  be  tested  by  testing  the  oriuality  or  ineriuality  of  arguments  in 
the  relevant  atoms. 

Example  2.3  Consider  the  queries  of  Example  2.1.  There  is  a  partial  mapping  b(UA')  — 
b{A.  R),  but  no  partial  mapping  c[B.B)  —  c(U.\\  ).  In  the  latter  case,  both  arguments  o( 
c( B.B)  are  equal,  but  the  arguments  of  c(U.  IF)  are  unequal.  □ 

For  any  atom  p{X)  and  integer  i,  let  p(X)[i]  dem.t.'  the  /th  argument  of  p(X). 

Theorem  2.1  Let  p(X)  and  q(Y)  be  atoms.  Then,  then'  is  a  ivartial  mapping  p(X)  —  7()  ) 
iff  the  following  conditions  are  true. 
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1.  p  and  q  are  the  same  predicate,  and  have  the  same  arity  (say,  n). 

2.  For  1  <  i  <  n,  if  p(.V)[i]  is  the  constant  c,  then  9(y)[i]  is  the  same  constant  c. 

3.  For  1  <  /  <  j  <  n,  if  p[X)[t\  =  p(-Y)[j],  then  fy(f )[»]  =  9(y)[j]- 

Proof.  If  conditions  1,  2  and  3  are  met,  then  the  set 

{p(X)[/.]  :=  q{Y)[i]  |  1  <  *  <  n,  p(X)[t]  is  a  variable} 

is  a  substitution  under  which  p{X)  is  made  syntactically  identical  to  q{Y).  The  converse 
follows  by  the  definition  of  a  partial  mapping.  □ 

Theorem  2.2  Let  p  be  a  predicate  of  arity  n  and  q  a  predicate  of  arity  m.  .Assume  that 

the  partial  mappings  p(A')  —  p{Y)  and  q{Z)  exist.  These^ partial  upppings  are 

consistent  iff  for  1  <  t  <  n,  1  <  j  <  m,  if  p(-Y)[i]  =  then  p(F)[f]  =  q{Z)[j]. 

Proof.  If  the  condition  is  met,  then  the  set 

{p(.Y)[/]  :=  p(y)[/]  I  1  <  i  <  n,  p(.T)W  is  a  variable) 

U 

{(j(^Y)[j]  :=  qiZ)[j]  I  1  <  j  <  m,  q{W)[j]  is  a  variable) 

is  single- valued  for  each  variable  in  X  U  IT,  and  is  a  substitution  under  which  p{X)  is  made 
syntactically  identical  to  p{Y)  and  q{W)  made  syntactically  identical  to  q{Z).  The  converse 
follows  by  the  definition  of  consistency.  □ 

The  importance  of  partial  mappings  lies  in  the  duality  that  such  mappings  enjoy  with 
containment  mappings.  More  precisely,  if  there  is  a  function  f  and  atoms  s  and  t  such  that 
f{s)  =  t,  then  (by  definition)  there  is  a  partial  mapping  s  -*  t]  in  addition,  the  existence 
of  a  partial  mapping  s  t  uniquely  defines  a  function  /  such  that  f{s)  =  t.  Further,  the 
assignment  set  induced  by  the  partial  mapping  is  merely  an  extensional  definition  of  the 
function  /. 

Example  2.4  Consider  the  queries  C3  and  C4  of  Example  2.1.  The  function  /  defined  by 
f{U)  =  A,f(y)  =  B  maps  b(U,V)  to  b(A,B)  (that  is,  f{b{U,V))  =  f{A.B)).  The  partial 
mapping  b{U,V)  b{A,B)  e.xists,  and  the  assignment  set  induced  by  the  partial  mapping 
is  {U  :=  A,V  :=  B}.  □ 

These  observations  lead  us  to  the  following  characterization  for  conjunctive  query  con¬ 
tainment. 

Definition  2.2  Let  C'l  and  C2  be  the  generic  conjunctive  queries  of  the  preceding  sul)soc- 
tion.  We  say  that  a  conjunct  mapping  M  from  C2  into  Ci  (written  M  :  Ci  —  C'l)  is  a 
sequence  <  mo,...,m/  >  of  (not  necessarily  distinct)  atoms  in  C\  such  that  the  following 
are  all  true. 
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1.  mo  is  (loiUo)- 

2.  For  I  <  j  <  I,  inj  is  for  some  i  £ 

3.  For  each  0  <  j  <  /,  there  is  a  partial  mapping  bj{Wj)  —  mj. 

4.  Each  pair  of  partial  mappings  is  consistent. 

For  each  j,  nij  is  the  destination  of  bj(Wj)  under  the  conjunct  mapping.  The  partial 
mapping  6o(l''Pd)  noiUo)  termed  the  head  mapping.  □ 

The  value  of  conjunct  mappings  is  illustrated  by  the  following  theorem. 

Theorem  2.3  [Conjunct  mapping  theorem)  For  any  conjunctive  queries  C\  and  C2,  there 
is  a  containment  mapping  from  C2  into  C\  iff  there  is  a  conjunct  mapping  from  C2  into  Cj. 
Corollary  Ci  C  C2  iff  there  is  a  conjunct  mapping  M  ;  C2  — ^  Cj. 

Proof.  Let  C\  and  C2  be  as  above.  If  there  is  a  containment  mapping  /  :  C'2  —  C’l,  then 
the  sequence  <  f{bo{Wo)),---fibiiWi))  >  defines  a  conjunct  mapping  from  C2  into  C'l. 
Conversely,  assume  that  M  is  a  conjunct  mapping  from  C'2  into  C\.  The  function  defined 
by  the  union  of  the  assignment  sets  induced  by  the  partial  mappings  bj{Wj)  mj,  for 
0  <  i  <  is  a.  containment  mapping  from  C'2  into  C\.  The  proof  of  the  corollary  follows  by 
the  containment  mapping  theorem.  □ 


Example  2.5  Consider  the  conjunctive  queries  C'3  and  C4  of  E.Kainple  2.1.  There  is  a 
conjunct  mapping  M  :  C4  C'3  in  which  the  destination  of  p{X)  is  p{X),  of  a{X,V)  is 
a(X,  B).  of  b(  U,  V)  is  b{A,  B)  and  of  c(17,  VV)  is  c(  A,  D).  □ 

2.2.3  The  /.>containment  problem 

We  extend  the  conjunctive  query  containment  problem  by  permitting  the  placement  of 
restrictions  on  the  destinations  that  a  conjunct  mapping  may  include. 

Definition  2.3  Let  Ci  and  C'2  be  the  generic  conjunctive  queries  of  Section  2.2.1.  and  let 
V  be  the  sequence  <  Do,  D\,. .  .Dt  >,  where 

1.  Dq-  {oo(f7o)}>  and 

2.  For  1  <  j  <  I,  Dj  C  {ai{Ui)  \  1  <  /  <  m}. 

V 

We  say  that  Cj  C  C2  under  V  (written  CiCC'2)  if  there  is  a  conjunct  mapping  M  : 
C'2  =>  C'l  such  that  nij  £  Dj.  for  all  j  .  Determining  such  a  containment  is  an  instance  of 
the  distinguished-deslination  problem.  The  Dj  are  termed  destination  sets,  since  they  limit 
the  possible  destinations  of  each  atom  under  a  conjunct  mapping.  If  any  destination  set  Dj 

is  empty,  then  C'l  ^C'2. 
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C'a  :  p{X)  a(A', B), b{A,B).b{C,B).  b.  [D.  D), c(B,B),  c{C,  B),c[A, D). 


CU:p{X)  a(X,V),  b(UA'), 

Figure  2.1:  Distinguished-destination  instance. 

Similarly,  if  we  are  given  conjunctive  queries  C\  and  C'i,  aiid  destination  sets  Vi  and  V2, 

VuV2 

then  we  may  define  the  equivalence  of  C\  and  C2  under  Pi  and  P2  (written  Ci  =  C'2) 

Vi '  I>2 

to  be  Cl  C  C2  and  C2  C  Ci-  □ 


Example  2.6  Consider  the  conjunctive  queries  C3  and  Ci  of  Example  2.1,  and  define  P  by 
the  arrows  in  Figure  2.1.  <  Ci,C2,X>  >  is  a  distinguished-destination  instance.  Note  that 
the  atom  b(D,D)  is  not  an  allowed  destination  for  biU.V').  but  is  an  allowed  destination 
(or  c(U.W).  □ 

VVe  further  parametrize  the  problem  by  the  maximum  cardinality  of  the  destination  sets 

D,. 

Definition  2.4  An  instance  of  the  k- containment  problem  is  an  instance  of  the  distinguished- 
destination  problem,  in  which  |Z)j|  <  k  for  all  j:  that  is.  no  destination  set  has  more  than 
k  elements.  □ 


Example  2.7  The  problem  of  E.xample  2.6  is  an  instance  of  the  4-containment  problem. 
□ 

2.2.4  Pruning 

Given  conjunctive  queries  Ci  and  C2  (the  geiK'iic  conjunctive  queries  of  Section  2.2.1)  and 
a  set  V  of  destination  sets,  we  may  prune  tlie  destination  sets  in  P  as  follows.  For  each  j. 
let  D  j  be  the  set  {dj,ll  <  q  <  »j}-  where  n ,  is  the  cardinality  of  Dj. 

Definition  2.5  VVe  say  that  the  distinguished-destination  problem  <  C\,C2~T)  >  is  pruned 
iff 


1.  For  all  p,s,  if  dps  €  Dp  then  bp(Wp)  --  that  is,  there  is  a  partial  mapping  from 
every  atom  in  C2  to  each  of  its  allowi  ,]  d<--iiii;iiions:  and 
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Cs-.piX) 


C4:piX) 


a{X,V),biLLV),c{U.W). 


Figure  2.2:  Pruning. 

2.  For  all  p,s,  if  dps  G  Dp,  then  for  all  q  there  is  a  t  such  that  dqt  €  Dt  and  the  partial 
mapping  bp{Wp)  dps  is  consistent  with  the  partial  mapping  bq{Wg)  dqu  that 
is,  every  allowed  destination  for  some  atom  is  consistent  with  at  least  one  allowed 
destination  for  every  other  atom. 


□ 


Define  any  dps  G  Dp  to  be  a  violation  of  Class  1  if  it  violates  condition  (1).  and  a 
violation  of  Class  2  if  it  violates  condition  (2).  If  dps  is  a  violation  of  Class  1  or  2,  then 
by  the  definition  of  a  conjunct  mapping,  the  destination  of  bp{Wp)  cannot  be  dps  under 
any  conjunct  mapping.  Hence,  the  removal  of  a  Class  1  or  Class  2  violation  dps  from  the 
destination  set  Dp  does  not  affect  the  e.xistence  of  a  containment.  We  prune  the  destination 

'D 

sets  Dj  by  iteratively  removing  all  violations  to  produce  a  set  V  such  that  CiCC'2  iff 


Example  2,8  Consider  the  distinguished-destination  instance  of  Example  2.6.  There  is  no 
partial  mapping  c(f/,  W)  ^  b{D,D)  (since  b  and  c  are  different  predicates),  and  the  partial 
mapping  c{U,  W)  c(B,  B)  is  inconsistent  with  every  choice  of  destination  for  6(f/,  V').  The 
pruned  distinguished-destination  instance  is  shown  in  Figure  2.2.  Note  that  the  instance  is 
now  a  2-containmeat  problem. 

□ 


Algorithm  2.1 
INPUT:  Ci,C2,I>  as  above. 

V'  V 

OUTPUT:  V ,  containing  no  violations,  such  that  C\  C  Co  iff  CjCC'-i. 

(1)  change  ^  true 

(2)  while  change 

(.3)  change  ^  false 

(4)  for  0  <  j  <  I 

(5)  for  s  6  Dj 
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%  If  5  is  a  Class  1  violation,  remove  it. 

{«} 

else 

%  If  s  is  a  Class  2  violation,  remove  it. 
violation  ^  false 

for  0  <  i  <  / 

temp  ^  true 
for  t  6  Di 

if  bj{Wj)  — ^  5  is  consistent  with  bi{]Vi)  t 
temp  false 

violation  ^  violation  V  temp 

if  violation 

Di  ^  Di  -  {s} 
change  <—  true 

V  is  the  set  of  the  resulting  Dj.  □ 

The  correctness  of  the  algorithm  follows  from  the  fact  that  the  removal  of  a  violation 
cannot  affect  the  e.xistence  of  a  containment.  Termination  is  guaranteed  since  each  as¬ 
signment  of  true  to  change  (at  line  1)  accompanies  the  deletion  of  a  destination  from  a 
destination  set.  The  algorithm  is  easily  seen  to  run  in  polynomial  lime,  and  in  LOGSP.ACE 
for  Datalog  queries  by  Theorems  2.1  and  2.2. 

In  the  remainder  of  this  chapter,  we  assume  that  all  distinguished-destination  instances 
have  l)een  pruned. 

2.2.5  Equivalence  of  the  containment  and  distinguished-destination  prob¬ 
lems 

It  turns  out  that  the  conjunctive  query  containment  and  distinguished-destination  prol)- 
lems  are  essentially  the  same  problem,  in  the  sense  that  these  problems  are  polynoinially 
equivalent. 

Given  conjunctive  queries  Ci  and  C2  and  an  integer  k  such  that  no  predicate  appears 
more  than  k  times  in  the  body  of  Ci,  we  may  construct  a  /^-containment  instance  l)y  setting 
Do  to  be  a  singleton  set  containing  the  head  of  C'l,  and  setting  each  other  Dj  to  be  the  s('t 
of  the  atoms  in  the  body  of  Ci  that  have  the  same  principal  functor  as  bj.  That  is,  each 
atom  in  the  body  of  C2  is  allowed  to  map  to  every  occurrence  of  the  same  predicate  in  lln- 
body  of  Cl. 

The  following  algorithm  performs  the  reduction  in  the  opposite  direction. 

Algorithm  2.2 

INPUT:  a  /c-containment  instance  Ci,  C2  and  V. 

OUTPUT:  conjunctive  queries  C[  and  €'2  such  that  no  predicate  appears  more  than  k  tinif's 

V 

in  the  body  of  C{.  and  such  that  C[  C  C2  iff  ('\CCi- 


(6) 

(7) 

(8) 

(9) 

(10) 
(11) 
(12) 

(13) 

(14) 

(15) 

(16) 

(17) 

(18) 
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(1)  The  heads  of  C[  and  Cj  are  those  of  C'l  and  C2,  respectively.^ 

(2)  Let  the  jth  atom  in  the  body  of  C2  be  e{Wj).  Since  V  is  pruned,  each  member  of  the 

destination  set  Dj  must  also  have  principal  functor  e.  Let  Dj  be  <  </  <  Uj}? 

where  (by  assumption)  nj  <  k.  Create  a  new  predicate  symbol  (jj.  Then,  add  the 
atom  gj{\Vj)  to  the  body  of  C2,  and  an  atom  gj{Uj^)  to  the  body  of  C\  for  1  <  ^  <  tIj. 

□ 


Example  2.9  Consider  the  pruned  distinguished-destination  instance  of  Example  2.8.  The 
algorithm  produces  the  following  queries. 

C'-^-.v{X)  i{X,B),e{A,B),e{C,B\g{C.B\g[A,D). 

C\-.v{X)  d{X,V)AU.V),g{U,W). 

□ 


The  algorithm  is  clearly  polynomial.  In  step  2,  each  predicate  gj  is  made  to  appear  at 
most  Uj  <  k  times  in  the  body  of  C{,  and  thus  no  predicate  appears  more  than  k  times  in 

V 

the  body  of  Ci.  Finally,  to  prove  that  C1CC2  iff  C{  C  C2,  we  observe  the  following. 

1.  Both  instances  involve  the  same  head  mapping,  yielding  the  same  induced  assignment 
set. 

2.  If  the  ji-th  atom  in  the  body  of  C2  is  €{Wj)  with  destination  set  Dj  =  {e([7j^)|l  <  g  < 

Uj  } ,  then  for  any  q,  there  is  a  partial  mapping  e(  Wj )  e(  Uj^ )  inducing  the  assignment 

set  S  iff  there  is  a  partial  mapping  gj{\^j)  —  gj(Uj,i)  inducing  the  assignment  set  5. 

2.2.6  Complexity  of  A:-containment 

The  ^'-containment  problem  is  clearly  in  .\fV\  merely  guess  a  conjunct  mapping  and  ver¬ 
ify  using  the  conjunct  mapping  theorem.  It  turns  out  that  the  A:-containment  problem 
and  ^•SAT  are  essentially  the  same  problem.  That  is,  for  k  >  2,  the  A’-containment  prob¬ 
lem  is  no  harder  than  AS  AT;  since  2SAT  is  known  to  be  polynomial,  we  may  conclude 
that  the  2-containment  problem  is  also  polynomial.  In  fact,  the  reduction  may  be  per¬ 
formed  in  LOGSPACE  for  Datalog  queries,  and  the  2-containment  problem  is  therefore 
in  NLOGSP.4CE  (and  hence  j\fC  [9]).  Further,  for  A  >  3.  ASAT  is  no  harder  than  the 
A-contaiument  problem,  and  the  3-containment  problem  is  therefore  .V'T’-complete. 

'To  preserve  the  safety  of  the  queries,  we  may  create  a  new  pretlicale  /.  anti  place  the  atom  f(Uo)  in  the 
hotly  of  C'l  aiul  the  atom  /(fPo)  in  the  body  of  C'L 


2.2.  THE  K -CONTAINMENT  PROBLEM 


53 


C3:  p(X):-  d(X,B)  e{A,B)  e{C\B)  cj{C,B)  9{A,D). 

t  T  ^  f  t  / 

t  =  l  y  =  I  Z  =  1  W  =  1  7'  =  1  .s=l 

I  1-  \  /  1  / 

A'  :=  A  A  :=  A  U  :==  A  U  :=  C  U  :=  C  U  :=  .4 

V  :=  B  V  :=  B  V  :=  B  W  :=  B  W  :=  D 

I  V  V 

C4:  P(A):-  diX,V)  e{U,V)  giU,W). 

2S AT  instance:  Class  1  clauses:  {<}{p}{2  +  +  5} 

Class  2  clauses  {f  +  f}{7(;  +  s} 

Minimal  .satisfying  truth  assignments:  ^  _  j,  _ 

Figure  2.3:  Testing  the  containment  C'3  C  C4  in  Example  2.9 
Polynomial  time 

We  will  provide  a  polynomial-time  reduction  from  any  instance  of  the  fc-containment  prob¬ 
lem  to  an  instance  of  kSAT,  for  k  >2.  We  assume  that  the  conjunctive  queries  Ci  and  C'2 
are  the  generic  queries  described  in  Section  2.2.1. 

The  basic  idea  is  that  we  carry  a  Boolean  variable  representing  each  choice  of  destination 
for  each  atom.  The  clauses  produced  are  of  two  kinds. 

1.  Clauses  that  enforce  the  requirement  that  each  atom  iii  C-i  have  a  legal  destination  in 
C\.  We  call  such  clauses  Class  1  clauses. 

2.  Consistency  constraints  that  disallow  inconsistent  pairs  of  partial  mappings.  Siu  li 
clauses  are  termed  Class  2  clauses. 

Example  2.10  illustrates  the  construction. 

Example  2.10  Consider  the  pruned  2-containincnt  instance  of  Example  2.9.  Let  the  fol¬ 
lowing  Boolean  variables  indicate  the  following  clioices  of  destinations  (see  Figure  2.3). 
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Boolean  variable 

atom 

destination 

t 

P(A') 

p{X) 

y 

d(X,V) 

d{X,B) 

Z 

eiU,V) 

e{A,B) 

w 

e{U,V) 

e{C,B) 

r 

g{U,W) 

g{c,B) 

s 

g{U,W) 

9{A,D) 

Recall  that  the  Class  1  clauses  represent  the  statement  “Every  atom  in  C4  has  a  legal 
destination”.  Now,  the  only  allowed  destination  for  p{X)  is  p(A'),  yielding  the  Class  1 
clause  {t}.  Similarly,  the  allowed  destinations  for  b{U,V)  are  b{A,B)  and  b(C,B)\  the 
corresponding  Class  1  clause  is  {z  +  w}.  The  set  of  Class  1  clauses  is  {0{j/){’^  + 

The  Class  2  clauses  enforce  consistency  of  the  corresponding  partial  mappings.  The 
mapping  b{U,V)  b{A,B)  is  inconsistent  with  c{U,W)  c{C\B)  (yielding  the  clause 
{z  +  ?■}),  and  the  mapping  b{U,V)  b{C,B)  is  inconsistent  with  c(f'\Vr)  —  c[A,D) 
(yielding  the  clause  {tD  +  .5}). 

The  2SAT  instance  created  is  {t}{j/}{z  +  u;}{7-  +  s}{z+J“}{w+^},  with  the  two  satisfying 
truth- assignments  izzy  —  z  =  s  =  \  and  <  =  ?/  =  u>  =  r  =  l.  The  first  such  assignment 
gives  a  conjunct  mapping  as  indicated  by  the  heavy  arrows  in  Figure  2.3.  Note  that  in  the 
general  case,  the  cardinality  k  of  the  Class  1  clauses  is  the  parameter  of  the  A'-containment 
instance,  and  Class  2  clauses  always  have  cardinality  2.  □ 

In  general,  more  that  one  member  of  a  Class  1  clause  may  be  true  in  a  satisfying  truth 
assignment  (signifying  more  than  one  possible  destination  for  some  atom).  In  this  case,  any 
one  choice  of  destination  suffices. 

The  following  is  a  formal  statement  and  proof  of  the  algorithm. 

Algorithm  2.3 

INPUT:  a  pruned  A’-containment  instance  <  Ci,C2,P  >,  with  k  >  2. 

OUTPUT:  a  A--S.A.T  instance  2  that  is  satisfiable  iff  there  is  a  A'-containment. 

(1)  If  any  destination  set  Dj  is  empty,  output  the  unsatisfiable  instance 

(2)  Create  (/  -f-  1)A  Boolean  variables  {.t^,|0  <  j  <  I-  <  i  <  A'}  and  (/  -1-  1  )k  set  variables 

{AjijO  <  i  <  /,  1  <  ?  <  A};  the  former  will  be  used  to  construct  the  A’S.4T  instance,  and 
the  latter  to  hold  induced  assignment  sets.  Each  destination  set  Dj  is  {dj,jl  <  i  <  nj). 
where  by  assumption  uj  <  k. 

(3)  for  0  <  i  <  / 

(4)  add  the  clause  {.x’ji  -I-  . . .  -f  to  2 

(5)  for  1  <  </  <  n.j 
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(6)  Ajq  <—  the  assignment  set  induced  by  the  partial  mapping  bj{Wj)  —  djg 

(7)  for  0  <  j  <  i  <  / 

(8)  for  1  <  d  <  nj,l  <  p  <  iii 

(9)  if  A  jg  is  inconsistent  with  A[ip] 

(10)  add  {xj^+x^  }  to  I 
□ 

The  algorithm  is  clearly  polynomial.  As  before,  we  term  the  clauses  that  are  added  to  I 
at  Step  4  Class  1  clauses,  and  those  that  are  added  at  Step  10  Class  2  clauses.  Since  Class 
1  clauses  have  cardinality  at  most  nj  <  k  and  Class  2  clauses  are  doubletons,  I  is  a  A:SAT 
instance. 

By  the  form  of  the  Class  1  clauses,  we  may  observe  that  in  any  satisfying  truth- 
assignment  for  I,  some  Xjg  is  true  for  each  j.  Define  a  satisfying  truth-assignment  to 
be  minimal  iff  exactly  one  xjq  is  true  for  each  j. 

Lemma  2.1  I  is  satisfiable  iff  it  has  a  minimal  truth-satisfying  assignment. 

Proof,  If  I  has  a  minimal  satisfying  truth-assignment,  then  it  is  clearly  satisfiable.  For  the 
converse,  assume  that  5  is  a  satisfying  truth-assignment  for  I.  By  previous  discussion,  at 
least  one  member  of  each  Class  1  clause  is  true  under  S.  Arbitrarily  pick  one  such  member 
of  each  Class  1  clause,  and  set  every  other  member  to  be  false.  Such  a  procedure  cannot 
make  either  a  Cleiss  1  clause  or  a  Class  2  clause  untrue  if  it  was  true  under  S:  hence,  the 
result  is  a  minimal  satisfying  truth-  assignment  for  I.  □ 


Lemma  2.2  If  C1CC2,  then  I  is  satisfiable. 

V 

Proof,  If  C1CC2,  no  destination  set  Dj  is  empty,  and  the  algorithm  does  not  terminate  at 
Step  1.  Let  <  r/o5o , . . . , >  be  a  conjunct  mapping  from  C2  into  Ci,  where  djs^  G  Dj  for 
all  j.  Construct  a  truth-assignment  for  I  by  setting  Xjs^  to  be  true^  and  all  other  variables 
to  be  false.  Such  an  assignment  satisfies  all  Class  1  clauses.  Further,  by  the  properties  of 
a  conjunct  mapping,  each  Class  2  clause  is  satisfied  as  well.  Hence,  this  truth-assignment 
satisfies  I,  □ 


Lemma  2.3  If  I  is  satisfiable,  then  C1CC2. 

Proof.  Assume  X  is  satisfiable,  and  let  S  be  a  minimal  satisfying  truth-assignment  for  I . 

Assume  that  the  variables  that  are  true  under  .S’  are  :ro5o _ ’i^isr  Then  <  doso^, _ disi  '' 

is  a  conjunct  mapping  from  C2  into  Ci  that  olxws  P,  since  the  Class  1  clauses  enforce  tli** 
requirement  that  djs^  G  Dj^  and  the  Class  2  rlausf's  enforce  the  requirement  that  no  tw<» 
choices  of  destination  are  inconsistent.  □ 
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Note  that  a  similar  reduction  may  be  performed  from  a  A:-equivaIence  problem 
<  C'l ,  C2 ,  , 'P2  >  to  a  A:SAT  instance  X.  That  is,  we  perform  the  reduction  for  < 

C\,C2,Vi  >  and  <  C2,Ci,V2  >  separately,  to  produce  A:SAT  instances  Ii  and  I2  re¬ 
spectively,  where  the  variables  in  Xi  and  I2  are  distinct;  then,  the  conjunction  of  Ti  and 

Vi,Vi 

X2  yields  a  A:SAT  instance  that  is  satisfiable  iff  Ci  =  C2. 

Theorem  2.4  Algorithm  2.3  is  correct. 

Corollary  1.  The  2-containment  (or  2-eqiiivalence)  problem  is  in  V. 

Corollary  2.  The  2-containment  (or  2-equivalence)  problem  is  in  NLOGSPACE  (and  hence 
AfC)  for  Datalog  queries. 

Proof.  By  Lemmas  2.3  and  2.4.  Corollary  1  follows  because  2SAT  is  in  V.  Corollary  2 
follows  because  2SAT  is  complete  for  NLOGSPACE,  and  NLOGSPACE  is  in  A^C  ([9]).  □ 

Related  work  Sagiv  and  Yannakakis  ([29])  propose  an  algorithm  to  decide  containment 
among  function-free  conjunctive  queries  C\  and  C2  in  which  each  atom  in  C2  has  only 
two  destinations  consistent  with  the  head  mapping;  the  primary  differences  between  their 
algorithm  and  Algorithm  2.3  are  that  their  algorithm  is  based  on  function-free  queries, 
and  performs  pruning  only  in  terms  of  atoms  whose  destinations  are  inconsistent  with 
the  head  mapping.  Related  work  on  polynomial-time  algorithms  includes  the  minimization 
algorithms  of  Aho  et  al.  ([3, 4])  and  .Johnson  and  Klug  ([19]).  The  most  general  result  among 
these  three  papers  is  that  of  .Johnson  and  Klug,  who  consider  “fanout-free”  conjunctive 
queries;  they  test  for  equivalence  between  two  queries  by  minimizing  each  query,  and  then 
determining  whether  the  minimal  queries  are  isomorphic.  The  conjunctive  query 

Cl  :p{X)  a(A',C),6(B,C),6(C,C),c(B,£)),c(C,D),c(C,C). 

is  not  fanout-free.  However,  Ci  is  equivalent  to  the  conjunctive  query 

C2  :  p{X )  «( A,  U),  b{ U,  U\ c{ If  U). 

as  may  be  verified  through  two  uses  of  Algorithm  2.4. 


.VT’-completeness 

VVe  now  show  that  the  /j-containment  problem  is  A'*P- complete  for  A:  >  3,  for  a  restricted 
class  of  conjunctive  queries.  We  will  begin  by  defining  the  concept  of  a  valid  labelling  for  a 
A:SAT  instance,  and  showing  that  a  A;SAT  instance  is  satisfiable  iff  it  has  a  valid  labelling. 
We  will  then  reduce  the  problem  of  finding  a  valid  labelling  for  a  given  fcSAT  instance  to 
that  of  solving  a  /^-containment  instance. 
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SSAT  instance: 


Cl  _  {.Ti  +  X2  +  X3}  C2  =  {.Ti  +  X4  +  X5} 


Satisfying  truth  assignment:  Xi  =  true^X4  =  false,  X2^xs,x^  arbitrary. 


Valid  labelling: 


<<  Tu,Di2^  Di3  >  <  -021,^22,  7^23  >> 


Figure  2.4:  Valid  labelling. 

Valid  labelling  Let  I  be  a  kSAT  instance  consisting  of  the  p  clauses  Ci, . . . ,  Cp  over  the  q 
variables  .I’l, . .  Without  loss  of  generality,  we  assume  that  I  contains  no  tautological 
clauses  (clauses  that  contain  a  pair  of  complementary  literals);  such  clauses  do  not  affect 
the  satisfiability  of  J,  and  may  be  removed  in  polynomial  time.  We  also  assume  that  no 
literal  appears  more  than  once  in  any  clause;  literal  repetitions  may  similarly  be  removed  in 
polynomial  time  For  all  i,  let  the  ith  clause  have  <  k  literals.  We  will  use  the  notation 
lij  to  denote  the  jih  literal  in  clause  q. 

Let  {Tij  I  1  <  i  <  p,  1  <  j  <  /::}  U  {Dij  \  I  <  i  <  pA  <  j  <  nj  be  a  set  of  2A:/7r  distinct 
constants.  A  labelling  A  of  1  is  an  p-vector  of  tuples  <  . . . ,  .4[in,]  >,  where  A[ij]  is 

either  Ti,  or  Di^  for  each  i  and  j.  Intuitively,  the  assignment  of  T,j  to  .4[/j]  represents  a 
satisfying  truth  assignment  under  which  the  literal  /,j  is  “true”;  similarly,  Dij  denotes  a 
“don't  care”  for  the  value  of  /,j. 

A  labelling  of  J  is  imlid  if 

1.  For  each  there  is  exactly  one  j  such  that  is  T,y  ;  that  is,  exactly  one  literal  in 
each  clause  is  “true”;  and 

2.  For  any  A[ij]  and  A[m.l],  i  ^  in,  if  lij  and  /,„/  are  complementary  literals,  then  either 
A[ij]  =  Dij  or  A[ml]  =  Dmi  (or  both);  that  is.  no  pair  of  complementary  literals  in 
different  clauses  are  both  true. 


Example  2.11 

Consider  the  3SAT  instance  /  consisting  of  the  two  clauses  C\  =  {.ti  +  X2  +  •■rs}  and 
C2  =  The  sequence  <<  T^,  D^-  D12  ><  D2\,T22,  D2Z  >>  is  a  valid  labelling 

for  /;  that  is,  we  set  the  first  literal  in  clause  C]  and  the  second  literal  in  clause  c,  to  “true", 
and  set  all  other  literals  to  “don't  care”  (see  Figure  2.4).  □ 


‘That  is,  J  is  an  instance  of  at -most- A:  SAT 
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It  turns  out  that  the  existence  of  a  valid  labelling  is  necessary  and  sufficient  for  the 
existence  of  a  satisfying  truth-assignment. 

Theorem  2.5  A  A:SAT  instance  is  satisfiable  iff  it  has  a  valid  labelling. 

Proof.  Consider  a  kSAT  instance  I  as  above.  If  I  is  satisfiable,  then  it  has  some  satisfying 
truth-assignment  5.  Construct  a  labelling  V  as  follows:  if  any  literal  lij  it  true  under  5, 
then  A[ij]  is  Tij]  otherwise,  it  is  Dij.  Since  S  satisfies  each  clause  in  I,  at  least  one  A[ij]  is 
Tij  for  each  i.  Now,  for  each  i,  select  any  A[ij]  that  is  assigned  Tij,  and  assign  Dim  to  every 
other  A[/77r].  This  procedure  cannot  create  violations  of  requirement  2  in  the  definition  of 
a  valid  labelling,  and  the  result  is  a  valid  labelling  for  I. 

For  the  converse,  assume  that  V  is  a  valid  labelling  for  I.  Construct  a  truth  assignment 
for  J  as  follows:  for  every  hteral  lij  such  that  A[ij]  is  T,y,  if  lij  is  the  positive  literal  Xp 
then  set  Xp  to  true  under  5,  and  if  it  is  the  negative  literal  Fp,  then  set  Xp  to  false;  set  all 
unassigned  variables  to  false.  This  procedure  assigns  a  unique  value  to  each  variable  in  J, 
since  a  valid  labelling  never  assigns  T-values  to  complementary  literals.  Since  at  least  one 
literal  in  each  clause  is  true  under  5,  S  is  a  satisfying  truth-assignment  for  I.  □ 

The  reduction  The  idea  of  a  valid  labelling  permits  a  reduction  from  a  fcSAT  instance 
/  to  a  /c-containment  instance  as  illustrated  below. 

Example  2.12  Consider  the  A:SAT  instance  I  of  Example  2.11,  consisting  of  the  two  clauses 
Cl  =  {x'l  -f  X2  +  and  C2  =  {xi  +  X4  +  .X5}.  We  construct  conjunctive  queries  Ci  and  C2 
as  follows. 

Let  us  use  the  Datalog  variable  Lij  to  represent  the  jth  literal  in  clause  c,-.  For  example, 
ill  represents  the  first  literal  in  clause  1;  that  is.  the  (occurrence  of)  ,ti  in  Ci.  Similarly,  L23 
represents  the  literal  Xs  in  clause  C2.  Further,  the  Datalog  variables  r,  and  Dij  respectively 
represent  a  choice  of  ‘"true”  or  “don’t  care”  for  the  j'th  literal  in  clause  c,. 

The  head  of  each  of  Ci  and  C2  will  be  the  O-ary  predicate  h  (see  Figure  2.5). 

Tp  represent  the  clause  Ci,  we  construct  the  atom  CifLu,  Z/12,  in  the  body  of  C2, 
and  the  three  possible  destinations  ci(Tii,  Z)i2,  £>13),  ci(  Du.  7x2,  £>13),  ci(Tii,  D12,  D13). 
Note  that  any  containment  mapping  from  C2  into  C\  enforces  the  fact  that  exactly  one 
literal  in  clause  Ci  is  “true”,  as  required  by  part  (T)  of  the  definition  of  a  valid  labelling 
(see  Figure  2.5).  We  similarly  construct  an  occurrence  of  a  predicate  C2  in  C'2  and  three 
occurrences  of  this  predicate  in  C\  to  represent  the  clause  cj. 

Finally,  we  must  impose  requirement  (2)  in  the  definition  of  a  valid  labelling:  that  is, 
that  no  two  complementary  occurrences  of  any  literal  are  both  assigned  “true”.  In  our 
example,  there  is  only  one  such  violation:  the  first  literal  in  Ci  and  the  first  literal  in  co  are 
complementary.  Hence,  we  construct  a  new  “enforcer”  predicate  61121  •  f'-nd  add  the  atom 
Lii)  to  the  body  of  C2  and  the  three  atoms  eii2i(Tii,D2i),  eii2i(F)ii, T21)  and 
^ii2i(-Dii,  D21)  to  the  body  of  Ci-  The  completed  instance  is  shown  in  Figure  2.5. 

Note  that  in  general,  the  number  of  predicate  repetitions  added  in  the  first  stage  is  the 
maximum  cardinality  of  any  clause  in  the  kSAT  instance  (i.e.  A-),  and  the  number  added  in 
the  second  stage  is  always  .3.  □ 
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3SAT  instance:  Ci  =  {.Ti  +  X2  +  .T3}  C2  =  {.ti  +  ;F‘,i  +  .T5} 

Satisfying  truth  assignment:  x\  =  tnie,X4  =  false,  a’2,a:3,.T5  arbitrary. 

Valid  labelling:  <<  Tn,  D12,  Dx^  >  <  D2Z  >> 

Cl  :  h  I — >>Ci[Tx\,Dx2,Dxz),  i - iC2{T2\,D22-,E>2z)-.  \ - >eu2\[Tii,  D21), 

- ri2,  £>13),  - ^C2{T)21-,T22tE2z),  - »  f  ,  ^21 ), 

- ^  Cl(-Dii, /?12»7i3),  - ^C2(D2i,D22,T23)y  - ^  en2l( 1 -021  )• 

C2  :  h  I - Ci( jLii,  X12, ii3)>  I - C2{L2\,  122^22)^  1 - tn2i(Iii.  lii)- 


Figure  2.5:  Illustrating  the  construction. 
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We  formalise  the  procedure  as  follows. 

Let  J  be  a  A:SAT  instance,  for  k  >  3.  We  construct  conjunctive  queries  Ci  and  Co 
such  that  no  predicate  appears  more  than  k  times  in  either  query,  such  that  no  variable 
is  repeated  in  any  conjunct,  and  such  that  Ci  C  C2  iff  T  has  a  valid  labelling.  Further, 
tlie  construction  will  yield  queries  Ci  and  C2  such  that  C2  C  C'l;  hence,  Ci  =  62  iff  T  is 
satisfiable.  The  time  expended  in  the  reduction  is  polynomial  in  the  size  of  C'l  and  (72- 

Algorithm  2.4 

INPUT:  A  kSAT  instance  I 

OUTPUT:  Conjunctive  queries  C'l  and  C2  as  above,  such  that  C\  C  C2  (C\  =  Co) 
iff  I  is  satisfiable 

(1)  Cl  and  C2  each  has  the  rule  head  h  (with  no  arguments).  That  is,  the  relation  for  li  is 

one  of  true  and  false,  ^ 

(2)  for  1  <  /  <  p 

(3)  Consider  the  ith  clause  Ci  —  {In  +  . . .  + 

(4)  Create  new  nondistinguished  variables  •  ••  and  the  predicate  symbol  Ci 

(5)  Add  to  the  body  of  C2  the  atom  Ci{Lii,  •  •  •  ?  Lm, ) 

(6)  add  7ii  ca atoms  to  the  body  of  Ci,  where  the  jth  argument  of  the  jth  such  atom 

is  and  the  rth  argument  is  Dir  for  all  r  ^  j 

(7)  for  I  <  i  <  j  <  p 

(8)  for  1  <  /  <  Ui,  1  <  m  <  nj 

(9)  if  In  and  Ijrn  are  complementary  literals 

(10)  create  a  new  predicate  constant 

(11)  add  eiijm{Lin  Ljm)  to  the  body  of  C2 

(12)  add  ^iijrn(Tih  D  jYn),,  ^  iijmi  D  n  ,TJ^Jl)  and  enjiniDin  Djm)  to  Ci 

□ 

The  algorithm  is  clearly  polynomial.  Define  conjiincts  that  are  added  to  the  bodies  of 
Cl  and  C2  in  Steps  5  and  6  to  be  Class  1  coujuncts,  and  those  that  are  added  at  Steps  11 
and  12  to  be  Class  2  conjuncts.  C2  contains  no  repetitions  of  any  predicate.  Ci  contains  at 
most  k  repetitions  of  Class  1  predicates,  and  three  rej)etitious  of  Class  2  predicates.  Note 
that  all  conjuncts  are  rectified  (that  is,  have  no  rejieated  arguments),  and  have  arity  at 
most  k, 

"For  a  reader  who  is  offended  by  a  O-ary  predicate,  ideiiii<  al  irMilts  may  be  obtained  by  making  h{X) 
the  head  of  each  rule,  and  adding  the  atom  a(-V)  to  each  rule  IhxIv 
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Lemma  2.4  There  is  a  containment  mapping  /  :  C'2  --  C'l  iff  I  has  a  valid  labelling. 

Proof.  Assume  /  is  a  containment  mapping  from  C2  into  Ci-  Construct  a  labelling  for  I  l)y 
setting  A[ij]  to  be  f(Lij)  for  all  i  and  j.  The  possible  destinations  for  Class  1  atoms  require 
that  condition  1  in  the  definition  of  a  valid  labelling  is  satisfied;  similarly,  the  possible 
destinations  for  Class  2  atoms  enforce  condition  2,  and  tlie  labelling  thus  constructed  is 
valid. 

Assume  V  is  a  valid  labelling  for  I.  Construct  a.  function  /  on  the  variables  of  iyv 
setting  f{Lij)  to  the  value  of  A[?j]  for  all  i  and  j,  and  extend  /  to  atoms  (see  Section 
2.2.1).  Since  f(h)  =  h,  the  head  mapping  exists.  By  condition  1  in  the  definition  of  a  valid 
labelling,  and  by  construction,  each  Class  1  atom  in  C2  has  exactly  one  possible  destination 
in  C'l  under  /.  The  possible  destinations  for  Class  2  atoms,  along  with  condition  2  in  the 
definition  of  a  valid  labelling,  ensure  the  functionality  of  /.  □ 

Lemma  2.5  C'2  C  Ci  . 

Proof.  The  function  defined  by  f{Dij)  =  f(Tij)  =  Lij  is  a  containment  mapping  from  C’l 
into  C'2.  □ 

Theorem  2.6  Algorithm  2.4  is  correct. 

Corollary.  3-containment  (3-equivalence)  is  .VT^-complete. 

Proof.  By  .Algorithm  2.3,  Theorem  2.4  and  Theorem  2.5.  we  conclude  that  3-containnient 
is  polynomially  equivalent  to  3-containment,  and  is  hence  A'P-complete.  □ 

Related  results  Chandra  and  Merlin  ([10])  have  shown  that  testing  conjunctive  c|uei'v 
containment  is  A/'T’-complete  even  for  function-free  qeries,  although  their  result  assumes 
up  to  six  repetitions  of  each  predicate  in  the  bodies  of  the  queries.  Sagiv  and  Yannakakis 
([29])  have  shown  that  the  problem  is  complete  even  if  no  predicate  appears  more  than 
three  times  in  the  body  of  either  query;  however,  their  reduction  assumes  the  repetition  of 
variables  in  the  arguments  of  some  conjuncts. 

2.3  Applications 

Let  us  turn  to  the  complexity  of  the  optimization  problems  that  we  discussed  in  Chapter  1 . 
It  turns  out  that  the  results  of  the  previous  section  allow  us  to  show  that  restricted  version.-, 
of  the  one-boundedness,  sequencability  and  basis-liiiearizability  problems  areA^P-hard.  and 
that  the  algorithms  of  Section  1.5.3  are  polynomial. 

2.3,1  Approach  and  notation 

The  A/’P-hardness  results  of  this  section  are  based  on  the  following  idea.  Given  any  3S.\  f 
instance,  the  techniques  of  the  preceding  section  peiunit  us  to  construct  conjunctive  qiieric-. 


Cl  :  I)  If. 

C2  '.  h  ■—  B>. 
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such  that  Cl  C  C2  iff  the  3SAT  instance  is  satisfiable.  That  is,  Bi  and  B2  are  the  bodies  of 
the  conjunctive  queries  generated  by  Algorithm  2.5.  Let  us  abuse  notation  by  saying  that 
Hi  C  B2  whenever  there  is  a  conjunct  mapping  M  :  B2  ^  Bi\  that  is,  a  consistent  choice 
of  destinations  (in  Bi)  for  the  atoms  in  B2.  Now,  the  conjunctive  query  Ci  is  contained 
in  the  conjunctive  query  C2  iff  there  is  a  conjunct  mapping  from  Bz  into  By,  since  h  (the 
head  of  Ci  and  C2)  is  a  0-ary  predicate.  Hence,  the  original  3SAT  instance  is  satisfiable  iff 
Bi  C  B2-  Note  also  that  by  construction,  there  is  a  conjunct  mapping  Bi  ^  B2\  that  is, 
that  B2  C  Bi- 

Example  2.13  Consider  the  conjunctive  queries  Ci  and  C2  of  Example  2.12.  The  conjunc¬ 
tions  Bi  and  B2  are  as  follows. 

B\  :  ci{Tii, Di2, Diz),ci{Du-,Ti2, Dis),ci{Dii, Di2,Ti3), 

C2(^21»  D22,  D23),  C2(I>2l!  ^22,  D23),  C2(L>21 ,  L’22>  T23), 

eii2i{Tn,  D2\),  eii2i(T>ii,  T21),  eii2i(Tiii,  L>2i)- 

B2  ■■  Ci{Lii,Li2,Li3),C2{L2l,L22,L23),eii2l{Lii,L2l)-  □ 

Now,  the  .AT-hardness  results  of  the  following  subsections  are  obtained  by  embeddiiig 
the  conjunctions  Bi  and  B2  (perhaps  with  added  arguments)  into  the  bodies  of  recursive 
rules,  such  that  the  resulting  rules  have  desired  properties  iff  Bi  C  B2.  Let  us  adopt  the 
convention  that  for  any  variable  A',  A'l^i  denotes  the  conjunction  Bi  in  which  each  con  junct 
is  given  the  additional  first  argument  X. 

Example  2.14  Consider  the  conjunction  B2  of  Example  2.13.  The  rule 
p(X)  piU),X\U\B2. 
stands  for  the  rule 

p{X)  p[U),ci{X,U,Lu,Li2,Li3),C2{X,U,L2i,L22iL23)‘^n2i{X,V,Ln,L2\).  □ 

Finally,  let  us  adopt  the  convention  that'  in  “unwinding”  a  recursion,  every  nondistin- 
guished  variable  in  a  rule  is  renamed  by  “priming”  at  each  stage  of  the  e.xpansion;  that  is, 
a  nondistinguished  variable  U  in  the  body  of  a  rule  is  renamed 


(  written  U^''>)  for  some  i.  For  a  linear  recursion,  the  version  of  a  nondistinguished  variable 
U  at  depth  i  -f-  1  is  the  variable  f/f'K 

Example  2.15  Consider  the  linear  rule  of  Example  2.14.  Figure  2.6  exhibits  our  notation 
on  expansions  of  p(A')  using  this  rule.  □ 

Let  us  return  to  the  conjunctions  Bi  and  B2  as  described  at  the  beginning  of  this  section. 
Let  5*'^  denote  the  conjunction  Bj  (that  is,  one  of  Bi  and  Bo)  in  which  every  variable  is 

primed  i  times.  It  is  clear  that  if  b\'^  C  B^^^  for  any  /  and  then  Bi  C  Bz- 
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piU),ciiX,U,Ln,L  12>  Lis),  C2(A  ,  U,  L2i,L22,  L23),  eii2l(A',  U,  Ill,  i/21 ) 


v{N'),ci{U,  U',  L'li,  L'i2i  L'ls),  C2{U  ,U' ,  L21,  L22,  L23),  eu2i{U,  U',  I'n,  I21 ) 


p{U"),  ci{U',  U",  Ui„L'i2,  L'i^),C2{U',  U",  L'i, ,  I"^,  L'i^),  eii2i{U',  U",  L'{„  I"  ) 


Figure  2.6:  Priming. 


2.3.2  One-boundedness 

Consider  the  safe,  (not  necessarily  function-free)  recursive  rule 

7-1  :  p{X)  p{Ui),...p{Uny,ai{Wi),...ak{m). 

where  X  is  a  vector  of  distinct  variables.  Recall  that  the  sirup  ri  is  said  to  be  l-bounded  if 
every  top-down  expansion  of  p(yY)  using  ri  is  contained  in  a  top-down  e.xpansion  of  depth 
at  most  1.  Recall  also  our  convention  that  the  top-down  expansion  representing  the  rule 

e  :  p(X)  p(A'). 

has  depth  0.  If  ri  is  l-bounded,  then  the  program  consisting  of  ri  and  any  basis  rule  of  t  he 
form 

r2  :  p(X)  b{X). 

may  be  reduced  to  a  nonrecursive  program,  in  which  ri  is  replaced  by  the  rule 

7-;  :  p{X)  b{Vi),...b{U,),ai{Wi),...ak{\%). 

Kanellakis  ([20])  has  shown  that  deciding  1-boundedness  is  ^V'P-hard  for  linear  sirups  defin- 
ing  a.  predicate  of  arity  four;  however,  the  reduction  involves  an  unbounded  number  of 
repetitions  of  EDB  predicates  in  the  body  of  the  sirup.  We  present  the  following  result. 

Theorem  2.7  Let  7  i  be  a  linear,  Datalog,  head-rectified  sirup  defining  a  binary  predicate, 
such  that  no  EDB  predicate  appears  more  than  four  times  in  its  body,  or  has  arity  greater 
than  five.  Testing  1-boundedness  is  yV'T’-hard  for  such  rules. 
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P(XX) 

p{X,Y) 


r  p{X,  y  ) 


p{X,  U),e{X,Y\  e(V,  Y),  f{X,  U)J{V,  U),X\R\B,.V\S\B2 


piX,Y) 


piX,  U),e{XXhe{V\Y)J{X\  U)J(V,U),X\R\Bi.  V\S\B2 


p(X.  f/'),  e(A',  U).  eiV,  U),  f(X,  U%  /(V',  U%  X\R'\B[ ,  V'\S'\B!, 


Figure  2.7;  The  expansions  of  Theorem  2.7 

Proof.  Given  a  .3SAT  instance  I,  construct  tlie  conjunctive  queries  Ci  :  h  Bi-  and 
C2  ■  h  i?2-  in  Algorithm  2.4  .  We  construct  a  program  V  that  is  1-bounded  iff  there 
is  a  conjunct  mapping  M  ■.  B2  B\.  Recall  that  by  construction,  B\  ^  B2.  V  is  the  sirup 

r  :  p(.Y,y)  piXX),e{XX),e{VX},f{X,U)Jiy.U).X\R\By,V\S\B2. 

where  XA\U,V,R  and  S  are  distinct  variables  not  appearing  in  Bi  or  B2,  and  where  /;,  e 
and  /  are  distinct  predicate  symbols  not  appearing  in  Bi  or  B^. 

Consider  the  top-down  expansions  £  and  r  (see  Figure  2.5). 

V  is  1-bounded  iff  C  £  or  C  r.  In  eitlu'r  case,  the  head  mapping  induces  the 
assignment  set  {A'  :=  X^Y  :=  y}.  The  nondistinguisln'd  variable  appears  in  the  7>atom 
generated  by  7'^,  so  that  the  only  choice  of  destinaiion  lor  the  body  of  £  violates  the  head 
mapping,  and  thus  ^  £. 
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Thus,  if  P  is  1-bounded,  then  C  r.  The  destination  for  the  atom  p{X,  U  )  in  r  requires 
that  U  :=  U'.  Then,  the  possible  destinations  for  the  atom  f{V,U)  require  that  V'  :=  X  or 
V  :=  V,  and  the  possible  destinations  for  e[V,Y)  require  that  V  :=  A'  or  V  :=  V'.  Thus, 
the  conjunct  mapping  M  :  r  =(>  must  assign  V  ;=  A",  and  we  may  conclude  that  there  is 
a  conjunct  mapping  V\S\B2  =>  X\R\Bi  or  V\S\B2  X\R'\B[,  as  desired. 

For  the  converse,  assume  C'l  C  C2;  then  B2  ^  Bi.  We  may  select  the  destination 
of  A'|jR|jBi  to  be  A'li?|Bi,  and  select  the  destination  of  every  other  atom  in  I'l  as  in  the 
previous  paragraph,  to  obtain  a  conjunct  mapping  from  r  into  r'^.  By  Theorem  1.7,  V  is 
one-bounded.  □ 

On  the  other  hand,  we  may  use  Algorithm  2.3  to  decide  1-boundedness  in  polynomial 
time,  for  linear  sirups  in  which  no  predicate  is  repeated  in  the  body  of  the  recursive  rule 
(i.e.,  n  =  1  and  the  rq  are  distinct  in  7'i).  The  reason  is  that  testing  1-boundedness  reduces 
to  two  2-containment  tests. 

Algorithm  2.5 

INPUT:  A  sirup  rj  as  described  above,  with  n  =  1  and  with  no  predicate  repetitions  among 
the  tij. 

OUTPUT:  “yes”  if  the  sirup  is  1-bounded,  “no”  otherwise. 

1.  Construct  the  empty  rule  e  :  p(X)  :-  p(.V).,  and  the  top-down  e.xpansion  ri/'i. 

2.  Use  .Algorithm  2.3  to  determine  whether  rj?'!  C  €  or  rjri  C  jq.  If  either  containment 
holds,  return  “yes”;  otherwise,  return  “no”. 

□ 


Theorem  2.8  Algorithm  2.5  is  correct. 

Proof.  Necessity  follows  by  the  definition  of  1-boundedness,  and  sufficiency  by  Theorem 
1.7.  □ 


Theorem  2.9  Algorithm  2.5  runs  in  polynomial  time  for  arbitrary  sirups,  and  in 
NLOGSPACE  (and  hence  in  jVC)  for  Datalog  queries. 

Proof.  Step  (1)  may  be  performed  in  polynomial  time  for  arbitrary  sirups,  and  in  LOGSP.Ai  I 
for  Datalog  queries.  The  proof  follows  by  Theorem  2.4  and  the  fact  that  NLOGSP.AC'F.  i- 
in.VC([9]).  □ 

Finally,  we  show  that  .Algorithm  1.1  (see  Section  1.5.3)  runs  in  polynomial  time. 
Theorem  2.10  .Algorithm  1.1  runs  in  polynomial  time. 

Proof.  Each  containment  in  the  algorithm  i.s  .m  instance  of  the  2-containment  problem,  r: 
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2.3.3  Rule  sequencability 

Consider  the  safe,  Datalog,  linear  rules 

j'l  ■-  P{Uo),(i'i{Ui),...ak(Uk)- 

7-2  :  p{X)  p{Wo),bim), . .  .bi(  Wi). 

where  X  is  a  vector  of  distinct  variables.  Define  e  to  be  the  following  rule, 
c  :  piX)  p{X). 

Recall  that  7'i  is  sequencable  under  1-2  if  (7‘i  +  7'2)*  C  '>'2‘>'i-  R-ule  sequencability  is  not  known 
to  be  decidable.  However,  the  problem  is  at  least  as  hard  as  any  problem  in  AfP,  as  the 
following  theorem  shows. 

Theorem  2.11  Let  ri  and  r2  be  rules  as  above,  with  the  additional  restrictions  that  no 
predicate  appears  more  than  three  times  in  the  body  of  /q  or  more  than  once  in  the  body  of 
7-9,  and  that  all  predicates  havearity  at  most  four.  Detecting  rule  sequencability  is  .'V’'P-hard 
for  rules  of  this  form. 

Proof.  Consider  an  arbitrary  3SAT  instance  J,  and  apply  Algorithm  2.5  to  obtain  conjunc¬ 
tive  queries  Ci  :  h  B\  and  C2  h  B2.  We  construct  rules  7’i  and  7^2  such  that  C\  C  C2 
iff  7'i  is  sequenceable  under  r2.  By  the  discussion  of  the  beginning  of  this  section,  we  know 
that  Bi  ^  52,  and  that  C\  C  C2  iff  B2  ^  Bi.  The  rules  7'i  and  7'2  are 


7-1  :  p{X,  y,  Z.  W)  p{Y,  X,  Z,  W),  X\Br. 
7-2  ;  p(A',  y,  Z,  W)  p(  A,  y,  W,  Z),  A  |52. 


where  A',  1',  Z  and  W  are  new,  distinct  variables  and  <7  is  a  new  predicate  symbol. 

Note  that  the  p-atom  in  the  expansions  rj  and  r'^  is  p(A’,  1',  IT',  Z),  so  rj  and  are 
contained  in  e  and  each  rule  is  1-bounded. 

.Assume  7  i  is  sequencable  under  7-2.  Then  rir2  is  contained  in  some  expansion  in  7-27  J'. 
By  the  1-boundedness  of  7-i  and  7-2,  rir2  must  be  contained  in  one  of  c,  7‘i,  1-2  and  7‘2ri  (see 
Figure  2.8).  In  each  case,  the  head  mapping  induces  the  assignment  set  {A  :=  .X.Y'  := 
Y.  Z  :=  Z.W  :=  W).  However,  in  the  first  three  containments,  the  destination  for  the 
/7-atom  generated  by  the  expansion  is  inconsistent  with  the  head  mapping,  and  we  may 
conclude  that  rir2  C  7*2 t*!. 

Consider  any  conjunct  mapping  M  :  7'27-]  =>  ri7-2  (see  Figure  2.8).  -As  we  observed,  the 
head  mapping  yields  the  assignment  {A'  :=  A’}.  Now.  by  definition  (see  the  introduction 
to  this  section),  the  first  argument  of  every  atom  in  the  conjunctions  A'|52  and  A|5'j  is  .\', 
and  therefore  both  these  conjunctions  must  map  to  the  conjunction  A'15i  in  ri?-?.  Hence, 
the  conjunct  mapping  M  must  map  A'|52  and  A’|5',  into  .\'15i.  so  52  =>  5i,  as  required. 

For  the  converse,  assume  that  B2  =>  Bi.  Then  the  mapping  M  indicated  in  Figure  2.8 
proves  that  7-i7-2  C  r2r-[,  which  in  turn  suffices  to  prove  sequencability  by  Theorem  1.9.  □ 

In  view  of  the  lack  of  a  known  algorithm  to  detect  sc(|uencability.  a  variety  of  conditions 
have  been  proposed  that  are  sufficient  (but  not  neccs.sary)  lo  detect  se(iucncability  in  pairs 
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ri  : 


’’2  : 


7-1  r2  ; 


p{X,Y,Z,W) 

I 

p{X,Y,Z,W) 


piX,Y,Z,W) 

p{Y,X,Z,WlX\Br 

p{X,Y,Z,W) 

p(X,Y,W,Z),X\B2 


p(XX\Z,W) 


p{Y,X,Z,WlX\B^ 


p{Y,X,W,Z),Y\B'2 


:  p{XX\ZAV) 

piX\Y,W\Z),X\B2 


p{Y,X,W,Z\X\B[ 


Figure  2.8:  The  construction  of  Theorem  2.11 
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of  linear  rules.  The  most  popular  condition  is  that  of  commntntivity,  riT2  C  J'2ri.  The  more 
general  condition  C  r2rl  was  proposed  independently  by  Ramakrishnan  et  al.  (['25]) 
and  loannidis  ([17]);  the  former  shows  that  the  condition  is  verifiable  in  polynomial  space. 
The  following  theorem  considers  the  complexity  of  such  conditions. 

Theorem  2.12  Let  7'i  and  7'2  be  rules  obeying  the  conditions  of  Theorem  2.11.  Then  testing 
each  of  the  following  conditions,  each  sufficient  to  prove  the  sequencability  of  rq  under  r2. 
is  AfV-h&vd. 

(a)  ri  C  7-2 

(b)  7'i  =  7’2 

(c)  r-i  C  7'^ 

(d)  rir2  C  7-27-1 

(e)  ri7-2  =  7-2 7'i 

(f)  7-1 7'2  C  7-2 7-^. 

Proof.  Let  J  be  a  3SAT  instance,  and  let  Bi  and  B2  be  the  conjunctions  resulting  from 
the  application  of  Algorithm  2.5.  VVe  provide  reductions  such  that  conditions  (a)  -  (f)  are 
satisfied  iff  there  is  a  conjunct  mapping  M  :  B2  B\. 

To  prove  (a),  (b)  and  (c),  we  construct  the  rules 

7-1  :p(.Y,n  p(F,A-),5i. 

T2-.v{X.Y)  :-p(r.A'),B2. 

The  observation  that  each  rule  is  1-bounded,  and  that  B\  =>  B2  by  construction,  suffices 
to  complete  the  proof. 

The  construction  of  Theorem  2.11  vields  yV^P-hardness  reductions  for  conditions  (d)  - 

(f).  □ 

Conditions  (a),  (b),  (d)  and  (e)  are  each  in  MV',  conditions  (c)  and  (f)  may  be  tested 
in  polynomial  space  by  the  chase  algorithm  of  Ramakrishnan  et  al.  ([25]). 

Theorem  2.10  implies  that  the  conditions  (a)  through  (f)  can  probably  not  be  tested 
in  polynomial  time.  However,  conditions  (a),  (b),  (d)  and  (e)  reduce  to  the  testing  of 
containments  among  pairs  of  conjunctive  queries,  and  Theorem  2.4  may  apply  to  these 
conditions  over  various  cla.sses  of  rules  that  arise  in  practice.  For  example,  if  7'i  and  /  o  each 
have  no  repetitions  of  predicates  in  their  respective  bodies,  then  this  algorithm  provides  a 
polynomial-time  test  of  conditions  (a).(b),(d)  and  (e).  loannidis  ([16])  has  also  propo.sed 
an  algorithm  to  test  commutativity  (condition  (e))  in  this  restricted  case. 

2.3.4  Basis-linearizability 
Consider  the  safe,  simple  recursive  rule 

n:  V{X)'.-  viUx) . />(Cn).ei(H’i) . ekim). 


2.3.  APPLICATIONS 


69 


where  n  >  1  and  X  is  a  vector  of  distinct  variables.  Recall  that  rj  is  basis-lineanzable  if. 
for  every  basis  rule 

7-2  :  p{X)  6(.Y). 

we  may  replace  ri  with  the  rule 

r[-.p{X)-.-  b{U,),...MUn-i)MUn\^i{W,),...,ek{\%). 

to  obtain  an  equivalent  program.  The  hnear  rule  r[  is  obtained  from  the  nonlinear  rule  rj 
by  replacing  all  but  the  last  occurrence  of  p  with  a  corresponding  occurrence  of  the  basis 
predicate  b.  Recall  that  basis-linearizability  indicates  that  right-linearity  is  the  normal 
form  for  the  conjunctive  queries  generates  by  the  program.  That  is,  the  sirup  7‘i  is  basis- 
linearizable  iff  every  top-down  expansion  of  p(X)  using  rj  is  contained  in  a  right-linear 
expansion. 

Linearizability  of  this  sort  was  investigated  by  Zhang  et  al.  ([40]  ^),  who  claim  a 
polynomial-time  decision  procedure  for  bilinear,  function-free  rules  with  one  nonrecursive 
subgoal  (i.e.,  n  =  2  and  k  =  1),  although  (as  we  will  show  in  the  next  chapter)  their  proof 
is  flawed.  In  Chapter  3,  we  will  extend  their  result  to  include  all  bilinear  recursions,  as  long 
as  no  nonrecursive  predicate  appears  more  than  once  in  the  body  of  the  rule.  We  will  also 
show  that  the  algorithm  runs  in  polynomial  time. 

Basis-linearizability  is  not  known  to  be  decidable  in  the  case  in  which  predicates  are 
allowed  to  appear  repetitively  among  the  subgoals.  Ramakrishnan  et  al.  ([25])  show  that 
detecting  basis-linearizability  in  bilinear  recursions  is  A^'P-hard;  their  reduction  involves  a 
recursive  predicate  of  arity  6,  and  places  no  bound  on  the  number  of  predicate  repetitions 
in  the  body  of  the  rule.  We  tighten  that  result  in  the  following  theorem. 

Theorem  2.13  Let  ri  be  a  bilinear  rule  as  above,  with  the  restrictions  that  p  has  arity  ‘2, 
all  other  predicates  have  arity  at  most  5,  and  no  predicate  appears  more  than  four  times  in 
the  body  of  rj.  Then,  deciding  basis-linearizability  is  ./VP-hard  for  rules  of  this  form. 
Proof.  Let  I  be  a  3SAT  instance  to  which  Algorithm  2.5  has  been  applied  to  produce 
conjunctions  Bi  and  B2.  We  construct  a  rule  vq  that  is  basis-linearizable  iff  there  is  a 
conjunct  mapping  M  :  B2  ^  Bx.  Note  that  Bx  =>  B2  by  construction.  The  rule  is 

r,  :  p[X,Y)  p(A,  U),p{T,Y),e(X,Y),e(V.  Y).f{X,  U)J{V,U),X\R\Bx, 

where  X,Y,U,T,V,R  and  S  are  new  and  distinct  variables. 

If  the  rule  rx  is  basis-linearizable,  then  the  tree  Tx  in  Figure  2.9  is  contained  in  a 
right-linear  tree  (a  top-down  expansion  in  which  only  p(T.Y)  is  expanded  through  rx).  By 
convention,  the  empty  rule 

e:p{X,Y)  p(A,y). 

is  a  right-linear  tree  of  depth  0. 


^Tliis  paper  has  recently  been  published  in  TOI»  '’ll'i.  but  the  proof  has  been  omitted. 
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Ti 


P{X,Y) 


piX,U)p{T,Y)e{X,Y)e{V\Y)fiX,U)fiV,U)  X\R\B,  V\S\B2 


p(A',  W)p{r,  U)e{X,  U)eiV,  f/)/(A,  UV{V',  U')  X\R’\B>i  V'lS'lB!, 


T2:  p(A,y) 


p(A',  U)p{T,Y)e{X,Y)eiV,Y)fiX,U)nVJi)  X\R\Bi  V\S\B2 


Figure  2.9:  The  construction  for  Theorem  2.13 

Since  p{X,Y)  is  not  a  leaf  in  Ti,  every  destination  for  p(X,Y)  among  the  leaves  of  Ti 
is  inconsistent  with  the  head  mapping,  and  we  may  conclude  that  Tj  ^  c;  that  is,  Ti  is 
contained  in  a  right-linear  tree  T2  in  which  the  liead  is  e.xpanded  using  rule  ri. 

Now,  the  head  mapping  from  T2  into  Tj  induces  the  assignments  {A'  :=  A',y  :=  y}. 
Hence,  the  atom  p(A',  U)  must  have  destination  p(  A,  U'),  and  every  conjunct  mapping  from 
T2  into  Ti  must  induce  the  assignment  U  :=  U'.  The  only  destinations  for  e{V,Y)  consistent 
with  these  assignments  force  V  :=  A  or  V  :=  F.  and  the  only  such  destinations  for  the 
atom  f{V,U)  force  the  assignments  V  :=  V'  or  V  :=  A.  Hence,  every  conjunct  mapping 
M  :  Ti  =>  Ti  must  induce  the  assignment  1'  :=  A.  forcing  the  mapping  y|5|.B2  AliZ|2?i 
or  V'|5|52  X\R'\B[;  in  either  case,  we  may  conclude  B2  =>  B\. 

For  sufficiency,  assume  that  B2  Bi.  Then,  the  conjunct  mappings  in  the  previous 
paragraph,  along  with  the  partial  mappings  p(T,Y)  —  p{T.Y')  and  X\R\Bi  A'liJjSi, 
suffice  to  prove  basis-linearizability  by  Theorem  1.18.  □ 

Finally,  we  justify  our  claim  that  .Algorithm  1.2  (in  Section  1.5.3)  runs  in  polynomial 
time. 

Theorem  2.14  Algorithm  1.2  runs  in  polynomial  time. 

Proof.  Each  containment  test  in  the  algorithm  is  an  instance  of  the  2-containment  problem. 
□ 


Chapter  3 


A  decision  procedure  for 
basis- linearizability 

3.1  Introduction 

Consider  the  safe  and  function-free  (“Datalog”)  logic  program  V  with  a  single  doubly- 
recursive  (‘'bilinear’')  rule  of  the  form 

ri  :  P(i){Y),p(2)iZ),ei{Ui),  ■  ■  ■  ,€^{1'^). 

and  a  single  basis  rule  of  the  form 

7-2:  p(Xi,...,Xm)  ■-  b{Xi,...,Xm)- 

where  we  have  subscripted  the  recursive  occurrences  of  p  for  ease  of  reference.  We  will  rofoi 
to  atoms  in  the  body  of  the  recursive  rule  by  their  principal  functors,  using  the  sul)script^ 
to  disambiguate  the  recursive  atoms;  that  is,  the  term  “P{i)”  will  be  used  to  refer  to  77(i)(V) 
(the  first  recursive  7>atom  in  the  body  of  ri),  and  the  term  “e,”  to  refer  to  the  atom 
We  assume  that  the  rules  7-i  and  1-2  satisfy  the  following  requirements. 

1.  The  variables  appearing  in  the  head  are  distinct.  Recall  that  these  variables  variables 
are  termed  distinguished,  and  that  all  other  variables  are  termed  nondistinguishcd. 

2.  The  rules  are  range-restricted  or  safe;  that  is,  every  distinguished  variable  ap|)eai  s  in 
the  rule  body. 

3.  The  base  predicate  b  and  the  subgoals  e,  are  EDB  predicates,  and  these  i)re{licaii  s 
are  distinct.  Recall  from  Chapter  1  that  EDB  relations  are  stored  by  e.xtension  in  ilu' 
database,  and  that  the  predicate  p  is  termed  intensional  or  IDB. 

Recall  that  V  is  termed  linearizable  bxj  basis  iff  right-linearity  is  a  normal  form  Im 
the  proof  trees  (or  conjunctive  queries)  generated  by  the  program.  That  is.  V  is  basis 
linearizable  iff  V  is  equivalent  to  the  following  linear  program  Q. 
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ri  ;  6(y),p(2)(f  ),ei(/7i), . . e/v((/'AO. 

‘>'2  •  P(A.\,  .  .  .,Xm)  ,  Xjn)- 

Q  is  obtained  from  T  by  replacing  the  first  recursive  occurrence  of  p  in  the  body  of  rj  (that 
is,  the  atom  ?>(i)(y))  with  the  base  atom  b{Y). 

Example  3.1  We  repeat  here  the  program  of  Example  1.1,  computing  the  transitive  closure 
of  b. 


n:  p(A,r)  :-p(A’,C/),p(f/,y). 

-,■2  :  p(A',y)  6(A',y). 

p(X,U),  the  first  recursive  atom  in  rj,  is  referred  to  as  p(i),  and  p{U,Y)  as  p(2)-  Recall 
that  we  proved  this  program  to  be  basis-linearizable  in  Section  1.5.2;  that  is,  the  program 
is  equivalent  to  the  following  linear  logic  program. 

ri  :  p(A,y)  b{X,U),p{U,Y). 

/•2  :  p(A,y)  b{X,Y). 

The  gains  of  performing  the  transformation  are  obtained  from  the  use  of  query  evaluators 
specific  to  linear  recursions.  □ 

In  this  chapter,  we  present  a  decision  procedure  for  the  recognition  of  basis-linearizability 
in  bilinear  recursions  of  the  form  of  F.  The  running  time  of  the  algorithm  is  polynomial  ^ 
in  the  size  of  the  program  V. 

3.1.1  Related  results 

Consider  the  program  V  of  the  preceding  section.  As  we  showed  in  Chapter  2,  if  repetitions 
are  allowed  in  the  EDB  subgoals  of  the  recursive  rule  in  this  program,  then  the  detection 
of  basis-linearizability  is  A/’T^-hard;  in  fact,  the  decidability  of  basis-linearizability  for  such 
programs  is  open.  In  this  chapter,  we  show  that  if  no  such  repetitions  are  allowed,  then 
basis-linearizability  may  be  detected  in  polynomial  time;  hence,  in  some  sense,  our  result 
represents  a  boundary  between  tractability  and  intractability.  Finally,  as  we  will  show  in  the 
next  chapter,  if  we  consider  programs  with  a  single  bilinear  rule,  an  unbounded  number  of 
linear  rules  and  5  basis  rules,  then  the  detection  of  basis-linearizability  becomes  undecidable. 

This  chapter  is  an  extension  of  the  work  of  Zhang.  Yu  and  Troy  ([40]  ^  who  proposed  the 
problem  of  basis-linearizability  in  the  restricted  case  jV  <  1;  that  is,  in  the  case  in  which  the 
recursive  rule  has  at  most  one  EDB  subgoal.  They  claim  a  polynomial  time  algorithm  for 
this  case;  however,  the  proof  of  correctness  of  their  algorithm  is  flawed,  and  we  will  touch 
upon  this  flaw  in  Section  .3.5.3.  Their  algorithm  also  ignores  so-called  deletion-linearizable 
recursions,  on  the  grounds  that  such  programs  may  be  linearized  in  a  different  way  (see 
Section  3.4). 


^  In  fact,  it  can  be  performed  in  linear  time. 

‘This  result  has  recently  been  published  in  TODS  ([41]).  but  the  proof  has  been  omitted. 


3.2.  THE  ALGORITHM 


73 


The  algorithm  of  [40]  does  not  extend  in  an  obvious  way  to  the  programs  that  we 
consider  (in  which  N  is  unbounded).  That  is,  the  representation  of  ei,...,ejv  as  their 
“join”  (a  single  atom),  followed  by  an  application  of  the  algorithm  of  [40],  is  insufficient  to 
detect  basis-linearizability.  The  following  example  illustrates  this  point. 

Example  3.2  Consider  the  program 

ri  :  p{X,Y)  p(X,U),p(U,Y),ciU),d{V). 

7-2:  p{X,Y}  biX,Y). 

where  b.c  and  d  are  distinct  EDB  predicates.  This  program  is  basis-linearizable,  as  the 
algorithm  of  Section  3.2  shows.  Assume  that  we  represent  the  EDB  predicates  c  and  d  l)y 
their  Cartesian  product,  in  some  new  EDB  predicate  e.  Then,  we  obtain  the  program 

7-1  :  piX,Y)  p{X,U),p(U,Y),e{U,Y). 

7-2  :  p{X,Y)  6(A',F). 

which  is  not  basis-linearizable,  as  the  algorithm  of  Section  3.2  also  shows.  □ 

The  proof  of  [40]  also  does  not  extend  directly  to  the  proof  of  correctness  of  our  al¬ 
gorithm.  Both  their  proof  and  ours  rely  heavily  on  the  idea  of  safety;  that  is,  on  the 
requirement  that  every  distinguished  variable  appears  in  the  rule  body.  In  the  case  N  =  1. 
every  distinguished  variable  A',  must  appear  among  the  arguments  of  p(i),  p(^2)  ot  the  single 
EDB  subgoal  ej  by  the  assumption  of  safety;  however,  if  iV  is  unbounded,  then  A',  may 
appear  as  the  only  argument  to  some  “new”  EDB  predicate  Cj. 

3.2  The  algorithm 

In  this  section,  we  present  an  algorithm  that  decides  whether  the  program  V  is  basis- 
linearizable,  and  show  that  the  algorithm  is  polynomial  in  the  size  of  V. 

Let  us  say  that  a  nondistinguished  variable  that  appears  only  in  the  arguments  of  the 
atom  t  in  7’i  is  said  to  be  local  to  t;  all  other  arguments  are  termed  nonlocal. 

Recall  that  we  will  refer  to  atoms  in  the  body  of  the  rules  in  V  by  their  principal  functors, 
using  subscripts  to  disambiguate  the  recursive  p-atoms. 

Definition  3.1  We  say  that  is  an  adjunct  to  p(2)  in  V  if  there  is  a  partial  mapping 
P(i)  P(2)  that  induces  an  assignment  set  that  is  the  identity  on  nonlocal  variables  in  p(,). 
That  is,  p(i)  is  obtainable  from  pp2)  by  replacing  0  or  more  occurrences  of  each  variable  in 
P(2)  with  anew,  local  variable.  If  p(i)  is  an  adjunct,  then  V  is  said  to  be  deletion-linfarizablr. 
□ 


The  intuitive  importance  of  adjuncts  is  that  they  may  be  deleted  from  the  recursive  rule 
to  produce  an  equivalent,  linear  program,  a.s  we  will  show  in  Section  3.4. 

Definition  3.2  Define  rq  —  pjij  to  be  the  rule  /q  in  which  p(i)  has  been  deleted,  and  let 
^  ~  P{i)  he  tlie  resulting  program.  If  p(i)  is  an  ndinnct  to  pp),  then  every  distinguished 
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variable  appearing  among  the  arguments  of  p(i)  also  appears  in  p(2),  so  we  conclude  that 
this  deletion  preserves  safety.  □ 

For  any  atom  t  in  the  rule  rj.  or  in  any  top-down  expansion,  let  us  use  the  notation  t[i] 
to  refer  to  the  ith  argument  of  t.  We  will  always  reserve  the  keyword  A',-  to  refer  to  the  /th 
distinguished  variable  in  the  head  of  the  recursive  rule. 

Definition  3.3  The  home  position  of  any  distinguished  variable  Xi  is  the  rth  position  in 
any  p-atom.  □ 

Definition  3.4  is  said  to  be  a  trivial  adjunct  to  p(2)  if  P(i)  is  an  adjunct,  pjj)  contains 
no  nonlocal  nondistinguished  variables  and,  whenever  the  distinguished  variable  A,-  appears 
in  2^(1),  then  P(2)(i]  is  A,  (that  is,  A',-  appears  in  its  home  position  in  p(2)).  □ 


Example  3.3  Consider  the  program  V  defined  as  below. 

7-1  :  p( A,  y,  ly,  Z)  p(  U,  A.  A,  5), p( A,  A,  .4,  W),  e{Y,  VV,  Z). 

:  p{X,Y,W,Z)  b(X.Y,W,Z). 

The  nondistinguished  variables  U  and  4  are  local  to  (that  is,  to  the  atom  p( U,  X ,  A.B)). 
The  atom  pjij  is  an  adjunct  to  the  atom  p(2)  (that  is,  the  atom  p(A, A, 4, 11' ));  in  fact, 
P(i)  is  a  trivial  adjunct  since  A'  appears  in  its  home  position  in  P(2)-  The  rule  7‘i  —  p^i'j  is 
presented  below. 

-  7>(i)  :  f^( A.  y,  W,Z)  p( A,  A,  4,  W),e{Y.  Z). 

Note  that  this  rule  is  safe.  □ 

Recall  from  Chapter  1  that  top-down  expansions  generated  by  V  are  written  in  a  way 
that  preserves  the  left- to- right  order  of  the  subgoals  in  every  rule,  and  that  an  e.xpansion 
or  proof  tree  is  termed  right-linear  \{  only  the  rightmost  ^j-atom  in  the  recursive  rule  is  ever 
recursively  expanded.  Finally,  recall  that  a  top-down  expansion  generated  by  V  is  termed 
open  if  only  the  recursive  rule  is  used  in  constructing  the  expansion. 

Example  3.4  The  expansions  Ti,T2,T3  and  T.i  of  Figure  3.1  are  all  open.  T\  is  tlie 
minimal  violation  of  right-linearity  in  V.  and  the  o.xpansions  and  are  the  right- 

linear  expansions  of  depth  0,  1  and  2  respectively.  □ 

We  will  use  the  terms  free  and  expansion  in tercliaiigeably.  As  we  mentioned  in  Chapter 
1,  these  expansions  may  all  be  constructed  in  time  that  is  polynomial  in  the  size  of  V . 

Now,  let  /  be  a  containment  mapping  (eciuivalmii  ly.  .1/  a  conjunct  mapping)  from  T2, 
Tz  or  into  T\.  Recall  from  Section  1.5.2  that  the  m.ipping  /  (or  M)  is  termed  acceptable 
if,  under  the  mapping,  the  of  the  root  of  / 1  i  lie  destination  of  no  p(i)-  leaf. 
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(O 


Figuie  3.1;  Expansions  iised  in  the  algorithm. 
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p(A',y) 


p(A,r) 


?>(i)(A,fo  P{2)iu,y) 

P(2){U',U)  \ 


P(^i)(X,U)  P(2)(^'iA) 


/  P(i)i^'^U')  P{-2){U',Y) 


s  ' 


Ti 


T4 


Figure  3.2:  Acceptable  mapping 

Example  3.5  Consider  the  transitive-closure  program  of  E.\ample  3.1.  The  contamment 
mapping /defined  by  /(.V)  =  XJ{Y)  =  Y.f(U)  =  f'',/(f')  =  U  an  acceptable  mapping 
from  T4  into  Ti,  as  indicated  in  Figure  3.2.  □ 

We  now  state  the  main  results  of  this  chapter. 

Theorem  3.1  V  is  basis-linearizable  iff  at  least  one  of  the  following  is  true. 

(1)  p(i)  is  a  trivial  adjunct  to  p(2)  in  V: 

(2)  7>(i)  is  an  adjunct,  and  V  -  P(i)  is  1-bounded;  or 

(3)  There  is  an  acceptable  mapping  from  one  of  T2.  T3  and  T4  into  T\. 

□ 

The  proof  of  this  theorem  is  an  extensive  combinatorial  analy.sis,  and  will  occupy  the 
remainder  of  this  chapter.  Before  we  begin  this  proof,  let  us  observe  that  the  algorithm 
implied  by  Theorem  3.1  is  polynomial  in  the  size  of  V. 

Theorem  3.2  The  procedure  of  Theorem  1  is  in  NLOGSP.A^CE. 

Corollary  1.  The  procedure  of  Theorem  1  is  in  NC, 

Corollary  2.  The  procedure  of  Theorem  1  is  polynomial. 

Proof.  Determining  whether  p(i)  is  an  adjunct,  or  a  trivial  adjunct,  is  easily  determined  in 
LOGSPACE.  so  Condition  (1)  is  in  LOGSPACE.  'P  -  P(|)  is  a  linear  sirup  with  no  repeated 
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EDB  subgoals;  by  Theorem  2.9,  Condition  (2)  is  in  NLOGSPACE.  Hy  a  case  analvsis  on 
the  destinations  of  the  /^(ij-atoin  in  T3  and  T4,  Condition  (3)  may  be  tested  through  nine 
2-containment  tests,  each  of  which  may  be  accomplished  in  NLOGSPACE  by  Theorem 
2.4.  Hence,  the  algorithm  is  in  NLOGSPACE.  Corollary  2  follows  by  [9],  and  Corollary  3 
immediately  follows.  □ 

3.3  Proof  Outline 

In  Section  3.4,  we  will  investigate  the  behaviour  of  programs  in  which  is  an  adjunct  to 
P(2),  and  show  that  in  this  case,  the  recursive  atom  p(i)  is  redundant  in  the  recursive  rule 
fj.  Section  3.5  contains  the  proof  of  Theorem  3.1. 

3.4  Adjuncts 

Let  us  assume  that  p(i)  is  an  adjunct  to  p(2)  in  V.  It  turns  out  that,  in  this  case,  the  atom 
P(i)  is  redundant  in  the  recursive  rule  rj;  hence,  we  call  such  programs  deletion-linearizable. 
We  will  prove  this  fact  in  the  remainder  of  this  subsection.  .\s  before,  by  the  form  of  the 
basis  rule,  we  will  restrict  our  attention  to  proof  trees  (ground  or  otherwise)  in  which  the 
basis  rule  is  never  used.  That  is,  we  adopt  the  convention  that  p  itself  is  both  an  EDB  and 
an  IDB  relation.  In  this  subsection,  we  will  consider  ground ot  expansions;  that 

is,  trees  in  which  a.U  variables  have  been  replaced  by  constants. 

An  important  property  of  adjuncts  is  treated  in  the  following  lemma. 

Lemma  3.1  If  p(i)  is  an  adjunct  to  p(2),  then  any  p-fact  that  unifies  with  p(2)  also  unifies 
with  p(i),  and  the  unifications  agree  on  the  values  of  all  nonlocal  variables  in  p^^y 

Proof.  Let  /  represent  the  assignment  set  induced  by  the  partial  mapping  p(i)(5'*')  —  P(2){^). 
and  assume  that  p{ai,...,am)  unifies  with  P(2)(Z)  under  the  substitution  r.  The  function 
r(/)  is  then  a  substitution  under  which  p(i)  unifies  with  p(ai, . . . ,  Om).  Since  /  is  the 
identity  on  nonlocal  variables,  r  and  r(f)  agree  on  these  variables.  □ 


Definition  3.5  Let  T  be  a  proof  tree  (or  expansion)  generated  by  V.  The  right  .s:t.rut  of  T 
is  the  proof  tree  (or  expansion)  obtained  from  T  by  discarding  all  p^jj-  atoms.  □ 

Lemma  3.2  If  p(i)  is  an  adjunct,  then  for  any  tree  T  generated  by  V  from  any  database 
D,  the  right  strut  of  T  is  a  proof  tree  generated  by  V  -  p(i)  from  D  and  yielding  the  same 
fact  as  T. 

Corollary.  V  CV  —  p(i). 

Proof.  Straightforward  induction  on  the  depth  of  the  right  strut  of  T.  □ 

Example  3.6  Consider  the  program  V  of  Example  3.3,  in  which  p(i)  is  an  adjunct  to  p(  ,). 
The  right  strut  of  the  expansion  of  Figure  3.3  (a)  is  shown  in  Figure  3.3  (b).  □ 
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p{u,x)  p{x,x)  e{y) 
p{v,x)  pix,x)  e{x) 


p(^,y) 

r\ 

p{x,x)  e(y) 
p{x,x)  e{x) 


(a) 


Figure  3.3;  Right  strut 


Lemma  3.3  Consider  any  database  D,  and  assume  that  p(i)  is  an  adjunct.  For  n  >  1,  if  5 
is  a  proof  tree  of  depth  n  generated  by  V  -  P(i)  from  D,  then  there  is  a  complete  tree  R  of 
depth  n  that  is  generated  by  V  from  D  such  that  S  is  the  right  strut  of  R. 

Corollary  1.  P  —  P{i)  C  V. 

Corollary  2.  V  =  V  —  P(i). 

Proof.  The  proof  is  a  straightforward  induction  on  n,  using  Lemma  3.1.  For  the  basis 
{n  =  1),  let  S  be  the  proof  tree 


pi  a) 

P(2)(6)  ei(cj)  ...  e,v(c,v) 


By  Lemma  3.1,  V  generates  the  proof  tree  R  defined  as 

pin) 


P(i){b)  P{2){b)  ei(ci)  ...  e.\-(^v) 

For  the  induction,  assume  the  truth  of  the  hypothesis  for  1  <  /  <  n.  Let  S  be  a  tree  of 
depth  n  as  in  Figure  3.4,  in  which  the  top  level  of  the  expansion  is  the  ground  query 

p(o)  P(2)(6),ei(ci),..  .,c,v(^v). 


and  where  P{2){b)  is  generated  from  V  —  P(i)  by  a  tree  7  of  depth  n  -  1.  By  hypothesis,  V 
generates  a  complete  tree  J  of  depth  n  -  1  from  the  leaves  of  7,  establishing  the  fact  p(6), 
for  which  7  is  a  right  strut.  By  Lemma.  3.1,  V  generates  a  complete  proof  tree  of  depth  n 
establishing  the  fact  p(a),  such  that  the  first  level  of  the  proof  tree  is  the  query 

P(a)  P(i){b),P{2){b).ei(ci). . . .  .€^r(c^’). 

and  where  the  subtrees  establishing  p(i)  and  p(2)  arc  l)oth  .7  (see  Figure  3'.4). 

The  proof  of  Corollary  1  is  immediate.  Corollary  2  is  proved  using  Lemma  3.2.  □ 
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Figure  3.4:  The  induction 
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Figure  3.5:  Right-linear  expansion 


3.5  Proof 

This  section  will  be  devoted  to  the  proof  of  Theorem  3.1. 

Recall  that  an  expansion  is  termed  open  if  the  basis  rule  is  never  used  in  constructing  the 
expansion,  and  closed  iff  there  are  no  /j-leaves  in  it;  that  is,  the  basis  rule  is  used  to  “close 
off”  all  intensional  atoms  in  the  latter  case.  Recall  also  that  an  expansion  or  proof  tree  is 
termed  right-linear  if  only  the  p(2)-atom  in  the  recursive  rule  is  ever  recursively  expanded 
(see  Figure  3.6.) 

By  definition,  V  is  basis-linearizable  iff  every  non-right-linear  closed  expansion  is  con¬ 
tained  in  some  closed  right-linear  expansion  (see  Section  1.4.3).  However,  the  form  of  t  he 
basis  rule  permits  us  to  deal  exclusively  with  open  e.xpansions;  that  is,  the  application  of 
tlie  basis  rule  to  an  open  expansion  amounts  to  the  “renaming”  of  every  p-leaf  in  the  open 
expansion  to  the  EDB  predicate  6,  and  every  closed  expansion  is  obtainable  in  this  manner. 
Thus,  T  is  basis-linearizable  iff  every  open  expansion  is  contained  in  a  right-linear,  open  ex¬ 
pansion.  VVe  may  now  think  of  the  basis-linearizability  of  the  sirup  rj  as  the  right-linearity 
of  the  proof  trees  generated  by  ri  from  every  database,  for  every  set  of  initialisation  rules. 
The  proof  of  correctness  of  Theorem  3.1  will  be  based  exclusively  on  open  expansions. 

The  outline  of  the  proof  is  as  follows. 

3.5.1  Proof  outline 

1.  In  Section  3.5.2,  we  will  show  that  the  conditions  of  Theorem  3.1  are  sufficient  to  show 
that  each  non-right-linear  expansion  is  contained  in  a  right-linear  expansion;  that  is, 
that  these  conditions  suffice  to  prove  basis-liiK'ari/.ahility  in  V. 

2.  Section  3.5.3  is  the  heart  of  this  chapter.  In  ilii^  section,  we  prove  that  the  conditions 
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Figure  3.6:  Necessity:  Ti  C 

of  Theorem  3.1  are  necessary  for  V  to  be  basis-linearizable.  The  proof  of  necessity 
proceeds  as  follows,  under  the  assumption  of  basis-linearizability  in  P.  Note  that,  in 
this  case,  the  minimal  violation  of  right-linearity  (expansion  Ti  in  Figure  3.6)  must 
be  contained  in  some  right-linear  expansion  (T5  in  Figure  3.6). 

(a)  If  is  an  adjunct  to  p(2),  then  p(i)  is  trivial  or  V  —  is  one-bounded. 

(b)  .Assume  that  the  minimal  violation  of  right-linearity  (expansion  Ti  in  Figure  3.7) 
is  contained  in  a  right-linear  expansion  of  depth  at  most  1  (one  of  T2  and 

in  Figure  3.7);  then,  the  containment  is  provable  by  an  acceptable  containment 
mapping. 

(c)  Assume  that  Ti  is  contained  in  a  right-linear  expansion  of  depth  at  least  2  (see 
Figure  3.8).  Then  one  of  the  following  must  hold. 

•  p(i)  is  an  adjunct  to  p(2)- 

•  T\  is  contained  in  a  right-linear  expansion  of  depth  at  most  1. 

•  T\  is  contained  in  the  right-linear  of  depth  2,  and  the  containment  is  prova  ble 
by  an  acceptable  mapping.  In  fact,  the  mapping  must  have  the  form  .slunvn 
in  Figure  3.9. 

3.5.2  Sufficiency 

In  this  section,  we  prove  that  the  conditions  of  Theorem  3.1  are  each  sufficient  to  prove 
basis-linearizability  in  V.  The  sufficiency  of  (’undition  3  follows  by  Theorem  1.8.  Consider 
Conditions  1  and  2. 

Recall  our  convention  that  ![/]  refers  to  ilie  /tli  argument  of  the  atom  L  ai>d  that  pji,  i- 
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T3 


Figure  3.7:  Assume  Ti  C  T2  or  T\  C  T3 
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Figure  3.8:  .A-SSume  Ti  C  D,. 
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Figure  3.9:  Show  Ti  C  r4. 

said  to  be  a  trivial  adjunct  to  7^(2)  if  P(\)  is  an  adjunct,  7^(1)  contains  no  nonlocal  nondistin- 
guished  variables  and,  whenever  the  distinguished  variable  A\*  appears  in  then  7^(2)[/] 
is  A I  (that  is,  A',-  appears  in  its  ‘diome”  position  in  7?(2))-  Throughout  this  chapter,  Xi  will 
always  be  used  to  refer  to  the  zth  distinguished  variable  in  the  head  of  the  recursive  rule. 

Consider  any  tree  T.  We  assume  that  every  nondistinguished  variable  U  in  7*1  is  renamed 
by  "priming’  at  each  successive  level  of  the  expansion;  that  is,  the  version  of  U  in  sibling 
atoms  in  T  is  renamed  to  the  new  variable  obtained  by  siiperscripting  U  with  a  ‘v”  1  times 
for  some  i  (written  C/'b)).  If  T  is  a  linear  expansion  (that  is,  at  most  one  7)-atom  is  recursively 
expanded  at  any  level),  then  the  version  of  U  at  depth  i  in  the  expansion  is  Figure 

3.G  illustrates  this  convention. 

Let  q  =  q^  represent  the  version  of  the  atom  q  at  depth  1  in  a  tree  T.  If  s  is  an  atom 
in  r,  then  P[j)S  represents  the  atom  s  in  the  subtree  T  of  some  tree  S  that  is  rooted  at  the 

atom  77jjj.  By  convention,  is  the  7;(j)-atom  obtained  by  expanding  through  the 

recursive  rule  vi. 

Example  3.7  Consider  the  transitive-closure  program  of  Example  3.1.  The  right-linear 
expansion  of  depth  3  has  leaves  and  arguments  as  shown  in  Figure  3.10.  □ 

Trivial  adjuncts  satisfy  the  following  property. 

Lemma  3.4  Assume  that  7^(1)  is  a  trivial  adjunct  to  7;(2)  in  V.  For  all  right-linear  expansions 
T  of  depth  n,  for  all  integers  i,  j  <  77,  and  for  all  /: 

1.  If  7>(i)[/]  is  the  distinguished  variable  AT.  then  7^(2)'P(i)[^]  is  also  AT;  and 

2.  Any  p-atom  that  unifies  with  P{2)'P{\)  iniifies  with  7>(2)*^P(i),  and  the  unifications  yield 
the  same  substitutions  for  all  variables  t  hat  are  nonlocal  to  either  of  these  two  atoms 
in  the  conjunctive  query  represented  by  l\ 
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p(X,U) 

(p(l)) 


piX,Y) 


P{UM 

,(P(2)) 


p{U,U'y  p{U\Y) 

(P{2)P(1))  (p(2)^) 

p(U',U")  p{U",Y) 

(P(2)^P(l))  (P(2)^) 


Figure  3.10:  Naming  and  renaming  conventions 


Proof.  .4ssume  P(i)[/]  is  X^.  Since  is  a  trivial  adjunct,  72(2)[^‘]  is  Xh-  A  straightforward 
induction  on  r  >  1  shows  that  P(2f{k]  is  A'jt,  which  immediately  implies  1. 

To  prove  2,  we  note  that  the  only  nonlocal  variables  that  appear  in  are  distinguished 
variables,  which  (by  1)  appear  in  the  same  positions  in  P(2)'P(\)  and  P(2)-’P(i)-  All  other 
variables  in  are  local  in  ri,  and  their  primed  versions  therefore  appear  in  only  one  atom 
in  T  (that  is,  appears  only  in  the  atom  P(2)‘p(i))-  ^ 


Example  3.8  Consider  the  rule  ri  in  Example  3.3.  Figure  3.11  depicts  an  expansion  using 
this  rule.  Note  that  A'  persists  in  its  home  position.  □ 

Lemma  3.5  Assume  that  p(i)  is  a  trivial  adjunct  to  p(2),  and  let  S  be  a  proof  tree  of  depth 
k  generated  by  P  -  p(i)  from  a  database  £>.  Then  P  generates  a  right-linear  proof  tree  R 
of  depth  k  from  D  such  that  5  is  its  right  strut  (see  Figure  3.12). 

Corollary.  If  pjj)  is  a  trivial  adjunct  top(2),  then  P  is  basis-linearizable. 

Proof.  S  has  only  one  p-leaf,  say  P(2)^(d').  VVe  may  eliminate  all  other  p-facts  from  D 
without  altering  the  fact  produced  by  S.  By  Lemma  .3.3.  P  generates  a  complete  tree  I 
with  depth  k  from  D\  hence,  in  I,  the  atom  P(2)^~'P(i)  unifies  with  the  single  p-fact  in 
D.  By  Lemma  3.4,  all  the  atoms  P(2)‘P(i)  niay  be  consistently  unified  with  this  2>-fact  to 
construct  a  right-linear  proof  tree  R  of  depth  k  that  is  generated  by  P  from  D  (see  Figure 
3.12). 

The  corollary  is  proved  by  observing  that,  by  Lf'iiima  3.2.  every  proof  tree  generated  by 
P  from  a  database  D  is  also  generated  by  P  -  P(i)  I  roni  D.  □ 

Lemma  3.6  .4ssume  that  p(i)  is  an  adjunct,  and  th.ii  P  -  p(,)  is  1-bounded.  Then  P  is 
basis-linearizable. 
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i?('+2))  ;;(2)(A.  a.  A('+2),  /1(-+i))  e{X,  A(‘+'),  A(‘) 

Figure  3.11:  Persistence  of  A'. 


Figure  3.12:  'Vroo^  S  and  R 
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Figure  3.13:  Assume  T\  C  Ts 


Proof.  Consider  any  database  D,  and  any  proof-tree  S  of  depth  k  generated  by  V  that 
establishes  a  fact  p(o).  By  Lemma  3.2,  there  is  a  proof  tree  /  of  depth  k  generated  by 
T  -  P(i)  that  also  establishes  this  p-fact.  If  V  is  1-bounded,  then  either  p(a)  £  D  or  p(a)  is 
generated  by  P  —  using  a  tree  .7  of  depth  1.  Applying  Lemma.  3.3  completes  the  proof. 
□ 

3.5.3  Necessity 

This  .section  will  be  devoted  to  the  proof  of  necessity  of  the  conditions  in  Theorem  3.1.  If  V 
is  basis-linearizable,  then  every  non-right-linear  open  e-\pansion  generated  by  V  is  contained 
in  a  right-linear  e.xpansion  (i.e.,  that  Ti  C  in  Figure  3.13).  We  will  focus  our  attention 
on  containment  mappings  that  must  exist  from  right-linear  expansions  into  the  minimal 
violation  of  right-linearity  {T\  in  Figure  3.13). 

First,  we  will  prove  that  if  V  is  basis-linearizable  and  p(i)  is  an  adjunct,  then  p(j)  is  a 
trivial  adjunct  ox  V  -  p(i)  is  one-bounded. 

Next,  we  will  show  that  if  T\  is  contained  in  a  right-linear  expansion  of  depth  at  most 
1  under  an  unacceptable  containment  mapping  (see  Figure  3.1-1),  then  p^x)  is  an  adjunct  to 
Pm- 

The  remaining  subsections  concerns  mappings  that  cannot  exist  from  long  right-linear 
trees  into  Ti.  Specifically,  we  will  show  that  if  Ti  is  contained  in  a  right-linear  tree  T5  of 
depth  at  least  2  (see  Figure  3.15),  then  one  of  the  following  holds. 

1.  7^(1)  is  an  adjunct  to  p^^)- 

2.  Ti  is  contained  in  an  e.xpansion  of  depth  at  most  1. 
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P 

p  p  es 
p  p  es  ^ 


Ti 


Figure  3.14:  Unacceptable  mapping  from  T3  into  Ti. 


Figure  3.15:  Assume  T]  C  T5 
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Ti 


T4 


Figure  3.16:  Conclude  Ti  C  T4 


3.  Ts  has  depth  2,  and  the  containment  mapping  is  acceptable.  In  fact,  the  mapping  is 
of  the  form  shown  in  Figure  3.16. 

This  procedure  completes  our  proof.  While  reading  these  subsections,  keep  in  mind  the 
fact  that  if  /  is  a  containment  mapping  from  some  tree  5  into  some  tree  T,  the  notation 
f(cj)  =  w  implies  that  9  is  a  leaf  in  .S'  and  tu  is  a  leaf  in  T. 

Notation  We  will  use  the  following  notation  to  describe  the  arguments  of  p-atoms  in  the 
rule  body.  The  expression 

■>.  'i 

P{2){  } 

denotes  the  situation  that  some  variable  A  appears  in  position  i  in  p(2),  the  distinguished 
variable  ,Y,  appears  in  position  j  and  the  distinguished  variable  Xj  appears  in  position  k. 
Unless  otherwise  stated,  A  is  an  arbitrary  variable  (perhaps  X{),  and  any  or  all  of  {i,j.k} 
may  be  eciual. 

Necessity  of  conditions  1  and  2 

.Yssume  that  V  is  basis-linearizable.  We  will  prove  that,  if  p^jj  is  an  adjunct  to  ■p^2)  in  T. 
then  either  'P  — P(i)  is  one-bounded  or  p(|)  is  trivial.  Recall  that  nondistinguished  variables 
in  7‘i  will  be  consistently  renamed  in  each  tree  by  pTiming. 

Lemma  3.7  If  /  is  a  containment  mapping  from  any  t  rce  S  into  any  tree  T,  then  /(.Y, )  =  .Y, 
for  every  distinguished  variable  A',-. 

Pivof.  Both  trees  have  the  root  p(A'i, ....  .Y„,).  □ 

Assume  that  p(i)  is  an  adjunct  to  p^2)•  Consider  the  expansion  Te  of  Figure  3.17(b); 
since  V  is  assumed  to  be  basis-linearizable,  the  top-down  expansion  described  by  T,;  in 
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V 


P(2)  es 


P{2)  es 


(b)T6 


(C)  7y 


h 


Figure  3.17:  T^c  TqC  Tg 
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Figure  3.17  is  contained  in  some  right-linear  expansion  Tg.  Assume,  now,  that  P  -  P(i)  is 
not  one-bounded;  we  will  show  that  is  a  trivial  adjunct  to  p(2)-  By  Lemma  3.3,  the 
right  strut  Ts  of  Te  is  contained  in  Te,  and  we  may  conclude  that  Tg  is  contained  in  Tg  (see 
Figure  3.17).  Thus,  Tg  C  Tg,  and  Tg  has  height  at  least  two  since  V  -  P(i)  is  assumed  not 
to  be  1-bounded. 

Let  /  be  a  containment  mapping  from  Tg  into  Tg.  Note  that  the  only  p-leaf  in  Tg  is 
P(2)P(2)i  this  leaf  must  be  the  destination  of  every  p-leaf  in  Tg. 

Lemma  3.8  If  p(i)[j]  is  X,,  then  P(2)[i]  is  X,-. 

Proof.  By  the  definition  of  an  adjunct,  P(2)[j]  is  Xi.  The  picture  is 

P(i)(i-  )  P(2)(  c  i  ) 

.  The  mapping  /(p(i))  =  P(2)P(2)  forces  /(X.)  =  C.  Since  /  preserves  distinguished  vari¬ 
ables,  C  =  A',-.  O 


Lemma  3.9  No  nonlocal  nondistinguished  variable  appears  in  p(i). 

Proof  Assume  that  the  nonlocal  nondistinguished  variable  A  appears  in  position  i  in  p(i); 
by  the  definition  of  an  adjunct,  P(2)[i]  =  The  picture  is 

i  i 

P(l)(  )  P{2)i  ) 

Examine  the  lowest  two  levels  of  the  tree  Tg,  which  we  assume  has  depth  n  -1-  1  for  some 
n  >  0.  The  destination  for  P(2)^"""^^P(i)  forces  =  .4'. 

By  safety,  A';  appears  in  the  rule  body.  By  Lemma  3.8,  A',  cannot  appear  in  p^iy 
Assume  that  it  appears  in  position  k  in  some  e,.  Then,  the  ^th  argument  of  P(2)^””*'cq 
is  However,  the  fcth  argument  of  e,  is  Xi  by  assumption,  and  the  A;th  argument 

of  P(2)eq  is  therefore  A  (see  Figure  3.11  for  an  illustration).  Further,  these  two  atoms 
are  tlie  only  possible  destinations  for  P(2)^”~^^c,j  because  of  our  assumption  of  no  repeated 
EDB  predicates  in  the  recursive  rule.  Hence,  the  possible  destinations  for  p(2)*"~*^c,j  forces 
/(  4(«-i))  to  be  a  nonprimed  variable,  which  is  a  contradiction.  Hence,  A’,  must  appear  in 
p(2)  only,  say  in  position  j  ^  i.  The  picture  is 

*  t  7 

P(i)(  ^  )  P(2)(  -4  A,  ) 

But  then,  the  jth.  argument  of  P(2)^"'^'^  is  and  the  j-th  argument  of  P(2)P(2)  is 

.4  7^  .4'.  a  contradiction. 

□ 

By  Lemmas  3.8  and  3.9,  p(i)  must  be  a  trivial  ad  jum  i  i(>  /;(  ))• 
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p(A'i,.. 

p(A'i,.. 

P 


Figure  3.18:  Minimal  program:  Ti  C  T2  or  Ti  C  T3 
Minimal  programs 

Let  us  say  that  V  is  minimal  the  minimal  violation  of  right-linearity  (Ti  in  Figure  3.18) 
is  contained  in  a  right-linear  e.xpansion  of  depth  at  most  1  (that  is,  in  one  of  the  expansions 
T2  and  T3  in  the  same  figure). 

Assume  that  V  is  basis-linearizable  and  minimal.  We  will  show  that  if  is  not  an 
adjunct  to  p(2)'  then  there  is  an  acceptable  containment  mapping  from  T2  or  into  7j. 
Since  any  mapping  from  T2  into  Ti  is  acceptable,  we  will  concern  ourselves  only  witii 
an  unacceptable  containment  mapping  /  from  T3  into  Ti;  that  is,  a  mapping  in  which 
/(/>(!))  =  P(2)  (see  Figure  3.19). 

Lemma  3,10  Assume  that  a  nondistinguished  variable  A  appears  in  an  EDB  subgoal  <  .j. 
and  that  /  is  a  containment  mapping  from  a  right-linear  tree  T5  of  depth  at  least  1  into  1\. 
Then  f{A)  =  A  or  f{A)  =  A'. 

*  Proof.  Assume  that  A  appears  in  the  Icih  position  in  €g.  Since  T5  has  depth  at  least  1.  Hn’ 

atom  Cg  appears  as  a  leaf.  Because  there  are  no  repetitions  of  any  EDB  predicate  in  tli<' 
subgoals  of  the  recursive  rule,  the  only  e,^-atoins  in  Ti  are  Cq  and  P(i)eq]  the  kth  argunu'iit 
of  the  former  is  A  and  the  A:th  argument  of  the  latter  is  A'. 

c 

e,(  .4  )  c,,(  A  )  p(,)e,(  A'  ) 

in  in  7'|  in  Ti 
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Ti 


T3 


Figure  3.19:  Unacceptable  containment  mapping 


Considering  all  possible  destinations  in  Ti  for  the  e,-leaf  in  T5  suffices  to  complete  the  proof. 

□ 


Recall  our  convention  that  t[/]  refers  to  the  ?'th  argument  of  the  atom  t. 

Lemma  3.11  Assume  that  V  is  minimal,  and  that  /  is  a  containment  mapping  from  T3 
into  T\  under  which  the  destination  of  p(i)  is  p(2)  (see  Figure  3.19).  Then  p(i)  is  an  adjunct 
to  p(2). 

Proof.  Assume  that  a  distinguished  variable  A',  appears  in  some  position  j  in  p(i).  By 
Lemma  3.7  and  the  assumed  mapping  /(p(i))  =  7>(2)>  P{2)[j]  is  -Y,-  To  complete  the  proof, 
we  show  that  if  any  nondistinguished  nonlocal  variable  .4  appears  in  any  position  i  in  p(i), 
then  .4  also  occupies  position  i  in  p(2).  Assume  the  existence  of  a  nondistinguished  variable 
.4  in  the  ith  argument  position  in  p(i),  such  that  .4  appears  in  p^2),  or  in  .some  e,. 

If  .4  appears  as  an  argument  to  some  e,,  then  by  Lemma  3.10,  /(.4)  is  .4  or  .4'.  .Since 
no  primed  variable  appears  in  p(2),  the  mapping  /(p(i))  =  P(2)  forces  j)(2)[>]  =  -4. 

.4ssume  now  that  .4  appears  in  the  jth  position  in  the  arguments  of  p(2).  The  picture  is 

P(i)(  ^  B  )  p(2)(  C  .4  ) 

Now,  B  cannot  be  distinguished,  otherwise  by  previous  discussion,  A  would  also  have  to  be 
distinguished.  By  the  assumed  mapping  on  p(i).  we  have  /(.4)  =  C.  Since  the  jth  arguments 
of  P{i)P{i)  ‘tnd  P{\)P{2)  are  both  primed  variables,  we  conclude  that  f(P[2))  =  P[2)-  Then 
/(.4)  =  .4  and  C  =  .4.  □ 
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Connectivity 

It  turns  out  that  if  V  is  basis-linearizable  and  not  minimal,  then  the  p-atoms  p(i)  and  p(2)  in 
the  body  of  the  recursive  rule  must  be  connected.  The  primary  tools  used  in  the  remainder 
of  the  proof  are  connectivity  and  safety. 

Two  atoms  in  the  body  of  the  recursive  rule  ;  i  are  said  to  be  directly  connected  they 
share  a  nondistinguished  variable;  connectivity  is  defined  as  the  transitive  closure  of  the 
direct  connectivity  relation.  Similarly,  two  nondistinguished  variables  are  connected  if  they 
appear  in  a  pair  of  connected  atoms.  The  special  case  of  connectivity  between  the  recursive 
atoms  p(i)  and  p(2)  is  formally  defined  below;  the  formality  is  necessary  to  the  results  of  the 
remainder  of  this  chapter. 

Definition  3.6  Assume  that  the  nondistinguished  variable  A  appears  in  the  arguments  of 
P(i),  and  that  the  nondistinguished  variable  B  appears  in  the  arguments  of  p(2).  We  say 
that  .4  and  B  are  directly  connected  'ii  A  =  B.  The  atoms  p^j)  and  p(2)  are  said  to  be  directly 
connected  if  they  share  a  nondistinguished  variable  .4.  W’e  say  that  .4  and  B  are  indirectly 
connected  A  A  ^  B,  and  if  for  n  >  0,  there  are  nondistinguished  variables  Ui,. . Un+\  and 
distinct  EDB  subgoals  ejti , . .  ■ ,  such  that 

1.  .4=  Ui. 

2.  B  =  Un+i. 

3.  For  1  <  /'  <  ?),  the  variables  Ui  and  Ui+\  ai)pear  in  the  arguments  of  ej,., . 

The  sequence  <  e/tj , . . . ,  ek„  >  is  termed  a  connection  .sequence  between  .4  and  B,  and  the 
sequence  <  U\,. ..,  Un+\  >  is  termed  the  corresponding  variable  sequence. 

□ 

As  we  mentioned  earlier,  nondistinguished  variables  in  ri  will  be  consistently  renamed 
in  each  tree  by  priming.  That  is,  if  A  is  a  nondistinguished  variable,  then  its  occurrences 
in  any  tree  will  be  where  i  is  an  integer  indicating  superscripts  of  /  “/”s.  Further, 
occurrences  of  the  nondistinguished  variables  .4  and  B  in  sibling  atoms  will  bear  the  same 
superscript.  If  the  tree  is  linear,  then  A^d  is  the  occurrence  of  A  at  depth  i  —  1  in  the  tree. 

Let  /  be  a  containment  mapping  from  a  rightdinear  e.xpansion  R  of  depth  at  least  1 
into  an  expansion  S.  Recall  that  if  /  is  a  mai)ping  from  some  tree  5  into  some  tree  T,  the 
notation  f{q)  =  w  impbes  that  q  is  a  leaf  in  S  and  w  is  a  leaf  in  T. 

Lemma  3.12  Assume  that  the  nondistinguished  variables  V  and  VV'  appear  in  the  argu¬ 
ments  of  some  EDB  predicate  e,.  For  any  i.  assume  that  f  is  a  containment  mappiitg  from 
a  right-linear  tree  R  of  depth  at  least  (i  -1-  1)  into  an  expansion  5,  and  let  and  Ifd')  he 
the  respective  occurrences  of  V  and  1-F  at  depth  (/  -|-  1)  in  R.  Then,  there  is  some  j  such 
that  /(V'(0)  =  yU)  and  /(fF(‘))  = 

Proof.  .4ssume  that  V'  and  VV  appear  in  tlie  positions  h  and  /  in  e^,  respectively.  'I'he 
/.•th  and  /th  arguments  of  the  Cg-atom  at  depili  (/  -f-  1 )  in  R  arc  fd')  and  lespoctivelv. 
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Since  there  are  no  EDB  repetitions  in  the  recursive  rule,  every  e,-a.tom  in  S  has  kth  and 
/th  arguments  VU)  and  WU)  respectively,  for  some  j.  □ 

Lemma  3.13  Assume  that  R.  is  a  right-linear  tree  of  depth  m,  S  is  any  expansion  and  / 
if  a  containment  mapping  from  R  into  S.  Assume  that  the  nondistinguished  variable  A  in 
is  connected  to  the  variable  B  in  P{2)-  Then  for  0  <  i  <  m,  exactly  one  of  the  following 
is  true. 

1.  A  =  B  and 

2.  A  ^  B,  and  there  is  some  j  such  that  /(.d*'))  =  and 

Proof.  If  A  and  B  are  directly  connected,  then  .A  =  B  and  (1)  above  is  true.  If  A  and  B 
are  indirectly  connected,  then  A  B. 

Let 

be  a  connection  sequence  between  .d  and  B,  and  let 

<  C'l , . . . .  L'n+l  > 

be  the  corresponding  variable  sequence.  Recall  that  .d  =  Vi  and  B  =  Un+\-  N\le  show  that 
(2)  above  is  true. 

Consider  the  possible  destinations  for  the  atom  7>(2)‘er-, ,  which  contains  the  variables 
and  By  Lemma  3.12,  there  is  some  j  such  that  /(.d^'^)  =  .d*')  and 

f{U2^)  =  If  U2  =  B,  then  the  result  follows.  Otherwise,  the  variables  and 

appear  in  the  leaf  P(2)‘efc.^;  the  use  of  Lemma  3.12  and  the  single- valuedness  of  /  shows  that 

An  inductive  repetition  suffices  to  complete  the  proof.  □ 

Note  that  the  assumption  that  there  are  no  EDB  repetitions  in  the  body  of  the  recursive 
rule  is  essential  to  the  proof  of  Lemmas  3.12  and  3.13. 

In  the  remainder  of  this  subsection,  we  will  show  that  if  V  is  basis-linearizable,  l)ut  p(i) 
and  p(2)  connected,  then  V  is  minimal.  Recall  that  that  V  is  termed  minimal  if  the 

small  non-right-linear  tree  Ti  is  contained  in  a  right-linear  tree  of  height  at  most  one  (that 
is.  in  one  of  T2  and  T3). 

Lemma  3.14  If  V  is  basis-linearizable  and  not  coni\ected.  then  V  is  minimal. 

Proof.  Assume  that  /  is  a  containment  mapping  fit)m  .some  right-linear  tree  Ts  of  height 
n  >  1  into  T3.  We  construct  a  containment  mapping  y  from  T3  into  T\.  W'e  define  the 
destinations  of  (/  as  follows:  </  preserves  the  destination  of  P(\).,  and  of  any  e,  connected  to 
P[\)\  all  other  atoms  in  I'l  are  mapped  to  “themselv<'s"  under  y.  More  formally,  we  partition 
nondistinguished  variables  into  two  classes.  .A  non(li.^l  inguished  variable  is  termed  /zed  if  it 
appears  in  p(i)  or  in  any  predicate  that  is  conne(t('(l  to  /;(|).  and  free  otherwise.  We  define 
g  as  under. 
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(jiV)  = 


V'  if  V  is  distinguished 
/(V)  if  V'  is  tied 


y  V  if  V'  is  free 

It  is  easily  seen  that  jr  is  a  containment  mapping  from  the  depth- 1  e.xpansion  T3  into 
(see  Figure  3.20).  □ 


Non-minimal  programs 

Finally,  let  us  turn  our  attention  to  the  case  in  which  the  minimal  violation  of  righl-linoarii  \ 
(Ti  in  Figure  3.21)  is  contained  in  a  right-linear  expansion  Ts  of  depth  at  least  2  (see  Fiu:iii  i' 
3.21). 

-Assume  that  V  is  basis-linearizable.  and  that  /  is  a  containment  mapping  from  V-,  iiH" 
T].  By  a  case  analysis  on  the  possible  destinations  for  the  leaf  (at  depth  1)  in  iIp- 
right-linear  tree  T5,  we  wiU  show  the  following. 

1-  If  fiP(i))  =  P(i)P(i)  and  is  not  minimal,  then  Ta  has  depth  2,  f(P{2]P{i))  =  P[\)l'.:. 
and  f{p(2)P(2))  =  7^(2)!  /  is  an  accei)ta.ble  containment  mapping.  (Recall  lliai  .01 

expression  of  the  form  “/(t)  =  s”  means  that  t  is  a  leaf  in  the  containing  tree,  ami  - 
is  a  leaf  in  the  contained  tree).  Figure  .3.22  describes  this  claim. 

2.  If  J{p(\))  =  7>(i)7>(2)  or  /(P(i))  =  P(2)  Fi.gure  3.23).  then  P(i)  is  an  adjunct  to  />,  , 
or  V  is  minimal. 
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\ 
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Ti  n 

Figure  3.21:  Assume  Tj  C 

Let  us  end  this  section  with  some  observations  on  the  nature  of  the  containment  mapping 

/• 

Lemma  3.15  If  the  nondistinguished  variable  A  appears  in  some  e,,  then  f{A)  =  A  or 
fiA)  =  A\  and  f{A')  =  A  or  /(A')  =  A'. 

Proof.  Assume  the  presence  of  the  atom 

e,(  A  ) 

in  /'i .  Since  T5  is  of  depth  at  least  2,  the  two  leaves 

i  « 

e,(  A  )  P(2)e,(  A'  ) 

must  appear  in  it.  Since  there  are  no  subgoal  repetitions  in  ri,  the  only  e, -atoms  in  Ti  are 
of  the  form 

i  * 

e,(  A  )  P(i)e,(  A'  ) 

□ 

Lemma  3.16  Assume  that  the  following  variables  appear  in  the  indicated  positions  in  ri: 

i  i  3 

P(i)i  C  )  P{2)(  A  )  e,(  A'.-  ) 

Then  /(A)  is  A',  or  C. 

J 

Proof  Since  T5  is  of  height  at  least  2,  the  atom  P[2)^,,  e.xists  in  Ts.  with  form  e,,(  .4  ). 

i  j 

The  only  c, -atoms  in  Ti  are  of  the  form  e,(  C  )  and  r,,(  A  ).  □ 
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(a)  Assumption. 


Ti 


T, 


(b)  Conclusion. 


Figure  .3.22:  The  case  /(p(i))  =  7>(i)P(i)- 
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Ti  Ts 

Figure  3.23:  The  cases  fiP(i))  =  P{i)P{2}  and  /(P(i))  =  P{2)- 

The  case  /(p(i))  =  P{i)P{i)  Assume  that  V  is  basis-linearizable.  Then  the  minimal  vi¬ 
olation  Ti  of  right  linearity  (see  Figure  3.24)  is  contained  in  a  right-linear  e.^pansion  Ts. 
Assume  that  Ts  has  depth  at  least  2,  and  that  /  is  a  containment  mapping  from  T5  into 
the  minimal  violation  Tj,  such  such  that  f{p(i))  =  P(i)P(i)  (see  Figure  3.24(a)).  We  will 
.show  that  V  is  minimal,  or  that  f{p{2)P{i))  =  P(2)»  has  height  2  and  f{P{2)P[2))  =  P(2) 
(see  Figure  3.24(b)). 

If/jfi)  and  72(2)are  not  connected,  then  V  is  minimal  by  Lemma.  3.14.  Assume  that  p(i) 
and  p(2)  are  connected.  The  mapping  /(p(i))  =  P(i)P(i)  yields  the  following  results. 

Definition  3.7  We  say  that  is  invariant  if  whenever  any  distingusihed  variable 
appears  in  it,  then. 7;(i)[i]  is  A',.  That  is,  AT,-  (also)  appears  in  its  “home”  position  in  7^(1). 
□ 

Lemma  3.17  Assume  that  is  invariant.  If  A,-  appears  in  any  position  j  in  7J(i),  then 
P(i)P(i)[i]  is  Xi. 

Proof.  By  invariance,  p{i)[i]  =  Xi,  and  our  result  follows  immediately.  □ 

Lemma  3.18  If  /(p(i))  =  P(i)P(i),  then  7^(1)  is  invariant,  and  f{A)  =  A'  for  all  nondislin- 
guished  variables  A  in  p(i).  Further,  any  variable  B  in  p(2]  that  is  connected  to  a  variable 
.4  in  72(1  (also  satisfies  f{B)  =  B'. 

Proof.  Assume  that  A’,  appears  in  position  j  i  in  The  picture  is 

7J(i)(  C  X'i  ) 

The  yth  argument  of  P(i)P(i)  is  C.  3y  the  assumed  mapping  /(p(i))  =  7:>{i)7>(i),  f(Xi)  -  C: 
since  /(.V,)  =  A',-,  our  result  follows.  Otherwise,  if  .V,  appears  in  p^i),  then  it  occupies 
position  i.  Hence,  7>(i)  is  invariant. 
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(a)  Assumption. 
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(b)  t’oiiclusion. 


Figure  3.24:  The  case  /(P(i))  =  P{\)P{i)- 
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If  the  nondistinguished  variable  .4  appears  in  the  ith  position  in  then  A'  appears  in 
the  ith  position  in  'P(\)V(\)\  hence,  f{A)  =  A'.  If  a  nondistinguished  variable  B  appearing 
in  p(2)  is  connected  to  A.  then  /(B)  =  B'  by  Lemma  3.13.  □ 

Proving  that  f{p(2)P(i))  =  P(i)P{2)-  We  now  prove  that  f{p(2)P{i))  =  P{i)P(2y  Recall 
that  /  is  a  containment  mapping  from  a  right-linear  e.xpansion  Ts  of  depth  at  least  2  into 
the  minimal  violation  Tj,  such  that  /(p(i))  =  P(i)P(i)- 

Definition  3.8  Define  a  Class  1  variable  to  be  a  distinguished  variable  A'j  that  appears 
only  among  the  arguments  of  2>(2)  (that  is,  Xj  does  not  appear  in  the  arguments  of  p(i)  or 
any  e,).  Define  a  Class  2  variable  to  be  a  Class  1  variable  Xj  that  appears  in  some  position 
k  ^  j  va  the  arguments  of  p(2),  but  does  not  appear  in  position  j  (its  “home”  position). 
Define  a  Class  3  variable  to  be  a  Class  2  variable  A'jt,  such  that  the  kth  argument  of  p(2)  is 
some  Class  2  variable  A';.  That  is,  the  arguments  of  p^2)  are  as  follows,  where  C  ^  A'j  and 
A',-  ^  Xk. 

}  k  i 

P(2)(  C  A,  .Yfc  ) 

For  1  <  /  <  3,  an  argument  position  /  is  termed  Class  i  iff  P(2)[l]  is  a  Class  i  variable.  □ 

Lemma  3.19  Let  /  be  a  Class  3  position.  Then  A'/  is  a  Class  3  variable. 

Proof.  Assume  that  A'a,.  is  a  Class  3  variable  appearing  as  the  /th  argument  of  p^2)■  By 
definition,  k  ^  1.  The  picture  is  as  follows,  where  A'j  and  Xk  appear  only  in  p(2),  f  7^  k,  k  ^  I 
and  C  ^  A,. 

j  k  i 

P(2){  C  Xj  Xk  ) 

By  safety.  A'/  appears  in  p(i),  p(2)  or  some  e,.  Since  Xk  does  not  appear  in  p^^■^)  and 
Xk  7^  A';,  A/  cannot  appear  in  any  e,  by  Lemma  3.16.  Assume,  now  that  A;  appears  in 
by  invariance,  it  must  occupy  position  /.  Since  T5  has  depth  at  least  2,  one  of  the 
atoms  7>(2)P(2)  or  P{2)P(2)P[\)  uaust  appear  as  a  leaf,  and  the  Ith  argument  of  each  of  these 
atoms  is  Xj.  Since  Xj  does  not  appear  in  p(i),  it  cannot  appear  in  either  of  the  leaves 
P{\)P(i)  or  P(i)P{2)  iu  Ti;  further,  the  Ith  argument  of  p(2)  is  Xk,  which  is  distinct  from  A', 
by  assumption.  Hence,  neither  of  the  atoms  P(2)P(2)  ^ud  P(2)P(2)P(i)bas  a  legal  destination 
in  Ti .  Hence,  by  safety,  A'/  must  appear  only  in  p(2)  and  our  result  follows.  □ 

Lemma  3.20  For  all  i,  i  is  a  Class  3  position  iff  is  a  Class  3  variable. 

Proof.  The  “only  if”  follows  from  Lemma  3.19.  For  the  converse,  note  that  by  definition,  if 
there  are  k  Class  3  variables  then  there  are  at  least  k  Class  3  positions;  our  result  follows 
by  Lemma  3.19  and  by  pigeonholing,  recalling  the  fact  that  all  distinguished  variables  are 
distinct.  □ 

Lemma  3.21  The  arguments  of  p(i)  and  p(2)  cannot  be  of  the  form 

P(i)(  )  P{2)i  -V,  ) 
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where  A  and  B  are  connected  nondistinguished  variables  and  Xj  appears  only  in  p(2)- 
Proof.  Assume  the  converse.  Since  B  is  nondistinguished,  j  k\  that  is,  Xj  is  a  Class  2 
variable.  By  Lemma  .3.18,  f{B)  =  B'.,  and  p(i)  is  invariant.  By  safety,  A'*.,  must  appear  in 
P(i),  P(2)  oi'  some  e,. 

Since  A j  does  not  appear  in  Lemma  3.16  prohibits  Xk  from  appearing  in  any  e,.  If 
Xk  appears  only  in  p(2),  then  it  is  a  Class  3  variable;  by  Lemma  3.20,  k  must  be  a  Class  3 
position,  and  Xj  a  Cla.ss  3  variable;  but  P(i)[i]  is  nondistinguished,  a  contradiction.  Hence, 
Xk  must  appear  in  the  arguments  of/)(i),  and  must  occupy  position  k  by  invariance.  The 
picture  is 

P(l)(  ^  )  P(2)(  B  Xj  ) 

Now,  one  of  P(2)P(2)  mid  P(2)P(2)P(i)  must  appear  as  a  leaf  in  T5,  and  B  appears  as  the 
kth  argument  of  each  of  these  leaves.  However,  the  Arth  argument  of  each  leaf  in  Ti  is  a 
nonprimed  variable,  contradicting  the  fact  that  /{B}  =  B'.  □ 


Lemma  3.22  Assume  that  the  arguments  of  p(i)  and  p(2)  are  as  follows 

«  i 

P(i)(  ^  )  P(2)(  B) 

where  A  and  B  are  connected  nondistinguished  variables.  Then  P(i)[j]  is  Xj. 

Proof.  By  Lemma.  3.18  and  connectivity,  f{B)  =  B'.  By  safety,  Aj  must  appear  in  the 
body.  However,  by  Lemma  3.16,  it  cannot  appear  in  e,.  By  Lemma  3.21,  Xj  cannot  appear 
only  in  P(2)-  Hence,  Xj  must  appear  in  P(i),  and  the  result  follows  by  invariance.  □ 


Lemma  3.23  Let  /  be  a  containment  mapping  from  a  right-linear  e.xpansion  Ts  of  depth  at 
least  2  into  the  minimal  violation  Ti,  and  assume  that  V  is  not  minimal.  Then  f{P[2)P(\))  = 
P(i)P(2)- 

Proof.  Since  V  is  not  minimal,  and  p(2)  niust  be  connected.  By  Lemma.  3.22.  we  must 
have  the  following  situation,  where  A  and  B  are  connected  nondistinguished  variables. 

«  i  j 

P(l)(  --i  )  P(2)(  B  ) 

By  Lemma.  3.18,  f{A)  =  A'  and  f{B)  =  B'.  .Now,  the  jth  argument  of  p(2)P(i)  is  B,  but 
the  yth  arguments  of  the  leaves  P(i)P(i)  and  p(2)  in  Ti  are  not  B';  that  is,  the  p-leaves  in  7  , 
are  as  below. 

i  ■'  3 

P(\)P(\){  )  P(l)P{2)(  B'  )  P(2)(  B  ) 

Hence,  f(p^2)P{i))  =  P{i)P(2)-  ° 


102 


CHAPTER  3.  A  DECISION  PROCEDURE  FOR  BASIS-LINEARIZABILITY 


Ts  has  height  2  Recall  that  a  Class  1  variable  is  a  distinguished  variable  that  appears 
only  in  the  arguments  of  p(2)>  that  any  position  in  p^2)  that  is  occupied  by  a  Class  1 
variable  is  a  Class  1  position.  It  turns  out  that  for  all  i,  A',  is  a  Class  1  variable  ilf  i  is  a 
Class  1  position. 

Lemma  3.24  For  all  i,  if  i  is  a  Class  1  position,  then  A",  is  a  Class  1  variable. 

Corollary,  i  is  a  Class  1  position  iff  A',-  is  a  Class  1  variable. 

Proof.  Assume  that  i  is  a  Class  1  position;  that  is,  P(2)[i]  is  a  variable  A'/.-  that  appears 
only  in  p(2y  If  A:  =  i  then  the  result  follows.  Assume  k  /  i.  By  safety,  A';  appears  in 
some  Cq  or  p(2)-  By  Lemma  3.16  and  since  Xk  (by  assumption)  does  not  appear  in  p(i), 
Xi  cannot  appear  in  any  e,.  If  A,-  appears  in  p(i),  then  by  invariance  it  occupies  position 
i.  But  then,  the  ith  argument  of  P(2)P(i)  is  Xk,  which  by  assumption  does  not  appear  in 
P(i)  (and  therefore  does  not  appear  in  P(i)P(2));  hence,  /(p(2)P(i))  ^  P(i)P(2)»  contradicting 
Lemma  3.23.  Therefore,  A,  must  appear  only  in  p(2). 

To  prove  the  corollary,  we  proceed  as  foUows.  By  definition,  if  there  are  k  Class  1 
variables,  then  there  are  at  least  k  Class  1  positions.  By  the  preceding  result,  the  number 
of  Class  1  variables  is  no  smaller  than  the  number  of  Class  1  positions;  thus,  the  number  of 
Class  1  variables  is  the  same  as  the  number  of  Class  1  positions.  The  corollary  now  follows 
by  the  preceding  result  and  pigeonholing.  □ 

Now,  we  show  that  if  A  and  B  are  connected  nondistinguished  variables,  then  f{A')  =  .4 
and  f{B')  =  B.  The  idea  is  that  we  know  that  P(2)P(i)[<]  =  -4',  and  that  /(p(2)P(i))  = 
P(i)P(2);  hence,  we  need  only  discover  the  value  of  P(i)P(2)[*]- 

Lemma  3.25  Let  /  be  a  containment  mapping  as  before,  and  assume  the  following  picture 

i  j 

P(l)(  ^  )  P(2)(  B  ) 

where  .4  and  B  are  connected  nondistinguished  variables.  Then,  either  A  =  B  and  P(i)P(2)[i] 
is  Xj,  or  P(i)7;(2)[f]  is  A. 

Proof.  By  Lemma  3.23,  we  have  the  picture 

i  j  j 

P{1)(  ^  -''j  )  P{2)(  B  ) 

Now,  the  yth  argument  of  the  atoms  P(2)P{2)  P(2)P(2)P(i)  >s  B',  and  one  of  these  atoms 

appears  as  a  leaf  in  T5.  The  jth  arguments  of  the  leaves  in  Ti  are  A',,  B'  and  B,  so  f{B')  is 
one  ol  Xj,B'  and  B.  By  connectivity  (Lemma  3.13).  if  f{B')  =  Xj,  then  .4  =  5  and  /(.4') 
is  Aj-;  if  f(B')  =  B  then  /(.4')  =  .4;  and  if  f{B')  =  B'  then  /(.4')  =  .4'. 

Consider  the  case  f{B')  =  5';  then,  by  connectivity,  we  must  have  /(.4')  =  .4'.  Now.  we 
know  by  Lemma  3.23  that  /(p(2)P(i))  =  P(i)P(2),  -4'  appears  as  the  /th  argument 

of  the  former.  If  P(i)P(2)[']  is  A' ,  then  the  variable  .4  appears  in  the  /th  position  in  p(i)  and 
P(2)-  The  picture  is 

i  i 


P(i)(  ) 


P(2){  1  ) 
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By  safety,  Xi  must  appear  in  the  rule  body.  By  Lemma  3.16,  if  A'j  appears  in  any  e,,  then 
f(A)  is  one  of  A,-  and  A,  contradicting  Lemma  3.18.  By  invariance  (since  .4  is  nondistin- 
guished).  A,-  does  not  appear  in  However,  A',-  cannot  appear  only  in  j^o).  since  it  would 
be  a  Class  1  variable  and  />(2)['i]  is  not  Class  1.  Hence,  f(B')  ^  B'  and  oiir  result  follows. 
□ 


Lemma  3.26  .Assume  the  following  arguments  for  pji)  and  p(2) 


I  j 

P(i){  •‘1  )  P{2){  -4  ) 

where  A  is  a  nondistinguished  variable.  Then  P(i)P(2)[?]  ^  Xj. 

Proof.  Assume  the  converse.  Then  by  assumption  and  invariance,  the  picture  is 


P(i)(  A  Xj  Xj  )  p(2)(  Xk  A  ) 

where  i  /  j,i  5^  k  since  .4  is  nondistinguished. 

By  Lemma  3.18,  /(.4)  =  4',  and  Lemma  3.23  requires  that  fip[2)P(i))  =  p(i)p(2);  since 
the  jth.  argument  of  P(2)P(i)is  A,  we  conclude  that  P(i)P(2)[fc]  =  A'  and  P(2)[/v]  =  .4. 

Now,  by  invariance,  A,  cannot  appear  in  p^j).  If  A'^  appears  in  any  e,,  then  by  Lemma 
3.16,  we  must  have  /(At)  =  4  or  f(Xk)  =  .Xi,  a  contradiction  since  i  k.  Hence,  A*, 
appears  only  in  P(2)i  say  in  position  /  distinct  from  i.  j  and  k.  We  now  have  the  picture 


P(i)(  4  Xj  .Xj  ) 


i  j  k  I 

P(2)(  At  4  4  A.  ) 


where  i  5^  j,i  ^  k,i  ^  l,j  5^  /.  However,  by  the  corollary  to  Lemma  3.24  and  since  .V,  is 
Class  1,  P(2)[i]  (At)  must  also  be  Class  1;  but  P(2)[^]  is  nondistinguished  (and  hence  not 
Class  1),  contradicting  Lemma  3.24.  □ 


Lemma  3.27  Assume  that  the  p-atoms  in  V  are  of  the  form 

/^(i)(  )  P(2)(  B  ) 

where  .4  and  B  are  connected  nondistinguished  variables.  Then  f{A')  =  .4  and  J{B')  =  IL 
Proof.  By  Lemmas  3.23,  3.2-5  and  3.26,  and  by  connectivity.  □ 


Lemma  3.28  T5  has  height  2. 

Proof.  Assume  that  .1  and  B  are  connected  nondistinguished  variables  appearing  in  pp) 
and  p(2)  respectively.  The  picture  is 


I 


P{-2)(  B  ) 


P{\)(  ) 
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By  Lemma  3.22,  =  Xj.  Further,  by  Lemmas  3.23  and  3.27.  P(2)[i]  =  A';  for  some  / 

such  that  P(i)[/]  =  .4.  Thus,  we  have  the  picture 

P(i)(  A  Xj  A  )  p(2)(  Xt  B  ) 

where  A  and  B  are  connected,  and  where  X/  ^  Xj.  We  assume  Tj  has  height  at  least  3, 
and  force  a  contradiction. 

Now,  ;>(2)P(2)7>(i)['1  =  A"  and  V(2)‘P(2]'P(i)[i]  =  B' .  The  mapping  f[B')  =  B  forces 
f{P(2)P{2)P(i))  =  P{2)^  yielding  f{A")  =  X/.  By  connectivity,  .4  =  B.  But  then  the  jth 
argument  of  each  of  p^2)P{2)P{2)  P(2)P(2)P(2)7^(i)  is  A".,  and  X/  does  not  appear  in  position 

j  in  any  leaf  in  Ti;  that  is,  T5  cannot  have  depth  3  or  depth  greater  than  3,  a  contradiction. 
□ 


Hence,  P{2)P(2)  is  a  leaf  iir  T^.  Let  A  and  B  be  as  in  the  proof  of  the  preceding  theorem. 
By  Lemma  3.27,  f(  B')  =  B.  Now,  B'  appears  in  the  jth  position  in  P(2)V(2)  (in  and  p(2) 
(in  Ti).  Howeever.  the  jth  arguments  of  and  p(i)P(2)  are  Xj  and  B'  respectively. 

Hence,  we  may  conclude  that  f{p(2)P{2))  =  P{2)- 

Cyclic  programs  Before  we  proceed  to  the  cases  /(p(i))  =  P(i)P(2)  and  f{p(i))  =  p(2), 
we  will  investigate  the  behaviour  of  cyclic  programs.  Recall  that  a  Class  2  variable  is 
a  distinguished  variable  X,  that  appears  only  among  the  arguments  of  p(2)i  such  that 
P(2)[J]  7^  A'j;  that  is,  Xj  appears  “out  of  position.’'  Recall  also  that  a  i)osition  i  is  termed 
Class  2  iff  P(2)[*]  is  a  Class  2  variable.  We  say  that  the  program  V  is  cyclic  if  for  all  j,  if  Xj 
is  a  Class  2  variable,  then  j  is  a  Class  2  position. 

Assume  that  the  arguments  of  the  p-atoms  in  V  are  as  follows,  where  .4  and  B  are  con¬ 
nected  nondistinguished  variables  (not  necessarily  distinct).  C  and  D  are  arbitrary  nondis- 
tinguished  variables  and  i  need  not  be  distinct  from  j. 

P(i)i  A  b  )  P(2)(  C  B  ) 

We  will  show  that  if  V  is  cyclic  and  basis-linearizable.  and  has  such  arguments,  then  V  is 
minimal.  For  this  purpose,  we  will  investigate  the  containment  of  the  complete  expansion 
of  depth  2  (expansion  Te  in  Figure  3.25)  in  a  right-linear  expansion;  that  is,  for  the  purpose 
of  this  subsection,  we  will  not  be  considering  the  minimal  violation  of  right-linearity. 

Consider  the  complete  expansion  Te  of  depth  2.  illustrated  in  Figure  3.25.  We  will 
assume  that  each  nondistinguished  variable  V  is  renamed  to  V  in  the  children  of  p(i).  and 
to  V"  in  the  children  of  p(2),  as  indicated  in  the  figure. 

Assume  that  F  is  basis-linearizable.  Then,  Tg  is  contained  in  some  right-linear  expansion 
T5  (see  Figuie  .3.2.5).  assume  that  ^  is  a  containment  ma|)|)ing  proving  the  containment. 
Since  a  nondistinguished  variable  appears  in  at  least  oin*  position  in  each  p-leaf  in  75,  T^  is 
clearly  not  contained  in  the  depth-0  expansion 
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Figure  3.25:  I'rees  Te  and 
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P(A') 


piX) 

The  following  lemma  shows  that,  if  V  is  cyclic,  then  Tq  is  not  contained  in  any  right- 
linear  expansion  of  depth  greater  than  1. 

Lemma  3.29  Assume  that  the  p-atoms  in  V  are  as  described  above,  and  that  V  is  l)a.sis- 
linearizable  and  cyclic.  Then  the  expansion  Te  is  not  contained  in  a  right-linear  expansion 
of  depth  greater  than  1. 

Proof.  Assume  that  /  is  a  containment  mapping  from  some  right-linear  expansion  Ts  of 
depth  at  least  2  into  Te;  we  force  a  contradiction. 

The  ith  argument  of  the  leaf  in  Ts  is  A,  and  the  «th  arguments  of  the  p- leaves  in  Te 
are  A',C',A"  and  C"\  hence,  f(A)  is  one  of  these  four  variables.  By  connectivity,  f{B)  is 
one  of  A',  C,  A",  C",  B'  and  B" . 

By  safety,  the  distinguished  variable  Xj  must  appear  in  the  body  of  the  recursive  rule. 
Assume  Xj  appears  in  some  e,,  in  position  k  say.  Since  Ts  has  depth  at  least  2,  the  leaf 
P(2)^q  appears  in  it,  with  fcth  argument  B.  The  only  possible  destinations  for  this  leaf  in 
Te  are  the  atoms  Z>(i)e,,p(2)e,  and  e,;  however,  the  kth.  argument  of  each  of  these  atoms  is 
a  nonprimed  variable,  a  contradiction. 

Assume  that  Xj  appears  in  the  arguments  of  p(i),  in  position  k  say.  The  kth.  arguments  of 
the  leaves  P(i)P(i)  and  P(2)P{i)  a.re  D  and  B  respectively,  both  of  which  are  nondistinguished; 
hence,  we  must  have  /(p(i))  =  P(i)P(2)  oi'  /(P(i))  =  P{2)P{2)->  and  the  kth  argument  of  p(i)p(2) 
or  P(2)P(2)  must  be  the  distinguished  variable  Xj.  Hence,  P(2)[^']  is  some  distinguished 
variable  Xi.  The  picture  is 

7J(i)(  A  b  Xj  )  p(2)(  C  B  Xi  ) 

Now,  since  T5  has  depth  at  least  2,  the  leaf  P(2)P(i)  must  appear  in  it.  The  kth  argument 
of  this  leaf  is  B\  however,  the  kth  argument  of  each  leaf  in  Te  is  a  nonprimed  variable,  and 
P(2)P(i)  iias  no  legal  destination. 

Thus,  Xj  must  appear  only  in  p(2),  in  position  k  say.  Since  P(2)[j]  is  nondistinguished. 
k  ^  j  and  Xj  is  Class  2,  which  violates  the  assumed  cyclicity  of  V  since  P(2)[j]  is  nondis¬ 
tinguished  and  hence  is  not  Class  2. 

□ 

Now,  consider  the  case  in  which  Te  is  contained  in  the  right-linear  expansion  T3  of  depth 
1.  Let  /  be  a  containment  mapping  from  T3  into  Te-  VVe  say  that  /  is  normalmd  iff  tlie 
following  conditions  all  hold. 

1.  The  destinations  of  the  leaves  p(i)  and  p(2)  in  T3  have  the  same  parent  in  Te.  That  is, 
both  p-leaves  are  mapped  to  the  same  “side”  of  the  complete  tree  Te. 

2.  The  destination  of  each  e,-leaf  in  T3  is  either  the  e,-leaf  at  depth  1  in  Te,  or  the  e,-leaf 
in  Te  with  the  same  parent  as  the  destinations  of  the  72-leaves  in  T3. 
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Figure  3.26:  Normalised  mapping 
The  following  is  the  analog  of  Lemma  3.14  in  Section  3.6.3. 

Lemma  3.30  Assume  that  the  arguments  of  the  p-atoms  in  P  are  as  described  above,  and 
that  Te  C  T3.  Then,  the  containment  is  provable  by  a  normalised  mapping. 

Proof.  Let  /  be  any  mapping  proving  the  containment;  we  construct  a  normalised  mapping  g 
from  /.  If  /(?^(i))  is  one  of  andp(i)P(2))  th^n  f{A)  is  one  of  A'  and  C";  by  connectivity. 

f(B)  must  be  A',C'  or  B'.  However,  the  jth.  argument  of  each  of  P(2)P(i)  and  P(2)P(2) 
a  double-primed  variable;  hence,  we  must  have  /(p(2))  =  l>(i)P(i)  or  /(p(2))  =  P(i)P(2)- 
Similarly,  if  /(p(i))  is  one  of  P(2)P(1)  or  P(2)P(2))  then  /(p(2))  must  also  be  one  of  these 
atoms. 

Partition  the  nondistinguished  variables  in  T3  into  two  classes.  Tied  variables  are  vari¬ 
ables  that  appear  in  p(i)  or  p(2),  or  in  any  e,  that  is  connected  to  pjx)  or  p(2);  free  variables 
consist  of  the  remainder.  Define  the  function  g  as  follows. 

V  if  V  is  distinguislied 
g{V)  =  <  /(^)  if  y  is  tied 

V  if  V  is  free 

That  is,  g  preserves  the  destinations  of  /  for  all  atoms  that  are  connected  to  or  P(2)?  ^i^d 
maps  all  other  to  “themselves”  in  Tq  (see  Figure  3.26).  g  is  a  normalised  containment 
mapping  from  T3  into  Te.  □ 

Lemma  3,31  Assume  that  the  arguments  of  V  are  as  described  above,  and  that  /  is  a 
containment  mapping  from  T3  into  Te  such  that  the  destination  of  the  leaf  in  Ti  is  a 
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p  V 


Figure  3.27;  One- bounded  ness 

child  of  the  atom  p(i)  in  Te;  that  is,  /{p(i))  =  P(i)P(i)  oi'  /(?>(i))  =  P(i)P(2)-  Then  V  is 
minimal. 

Proof.  Construct  a  normalised  containment  mapping  g  from  T3  into  Te;  </  is  a  containment 
mapping  from  Ts  into  the  minimal  violation  T\  of  right-linearity.  □ 

Lemma  3.32  Assume  that  the  arguments  of  V  are  as  described  above,  and  that  /  is  a 
containment  mapping  from  T3  into  Tq  such  that  the  destination  of  the  leaf  p^\)  in  T\  is 
a  child  of  the  atom  P(2)  in  Te;  that  is,  /(p(i))  =  P(2)V{\)  or  /(P(i))  =  P(2)P(2)-  If  ^  is 
basis-linearizable,  then  V  is  minimal. 

Proof.  Construct  a  normalised  containment  mapping  g  from  T3  into  T&\  5  is  a  containment 
mapping  from  T3  into  the  right-linear  tree  T4  of  depth  2  (see  Figure  3.27).  Let  Q  be  the  set 
of  right-linear  open  expansions  generated  by  P;  as  we  mentioned  in  Chapter  1,  Q  may  be 
considered  a  program.  Then,  T4  is  the  minimal  violation  of  one-boundedness  in  Q;  by  (the 
discussion  foUowing)  Theorem  1.7,  we  may  conclude  that  the  set  of  right-linear  expansions 
of  V  is  one- bounded. 

Now,  consider  the  minimal  violation  T\  of  right-linearity  in  V-  Since  we  assumed  that  V 
is  basis-linearizable,  Ti  is  contained  in  some  right-linear  expansion.  However,  since  the  set 
of  right-linear  expansions  is  one-bounded,  we  conclude  that  T\  is  contained  in  a  light-lineai 
expansion  of  depth  at  most  1,  and  the  result  follows.  □ 

The  foUowing  lemma  details  the  property  of  cyclic  iirograms  that  wiU  be  used  in  the 
following  two  sections.  This  lemma  is  key  in  the  piool  that  if  the  minimal  violation  Tj  of 
right-linearity  is  contained  in  a  long  right-linear  ex|)aIl^io^  under  an  unacceptable  contain¬ 
ment  mapping,  then  Tj  is  minimal  (that  is.  Ti  is  couiuiiu'd  in  the  right-linear  expansion  of 
depth  1). 
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Lemma  3.33  Assume  that  V  is  basis-lineavizable,  and  that  the  p-atoms  in  the  recursive 
rule  have  the  form 

P(i)(  .4  b  )  P(2)(  X  Y  ) 

where  A  and  B  are  connected  nondistinguished  variables,  and  where  X  and  Y  are  distin¬ 
guished.  Then  T  is  minimal. 

Proof.  Consider  the  complete  expansion  Te  of  depth  2;  since  V  is  assumed  to  be  basis- 
linearizable,  Te  is  contained  in  some  right-linear  expansion.  By  discussion,  and  by  Lemma 
3.29,  Te  must  be  contained  in  the  right-linear  expansion  Tz  of  depth  1.  By  Lemmas  3.31 
and  3. .32,  V  is  minimal.  □ 

The  result  of  Zhang,  Yu  and  Troy  Zhang  et  al.  ([40])  claim  a  decision  procedure 
for  basis-linearizability  in  the  restricted  case  in  which  the  recursive  rule  has  at  most  1  EDB 
subgoal.  As  we  mentioned  in  Section  3.1.1,  the  proof  ^  of  [40]  is  flawed.  The  flaw  is. 
essentially,  the  fact  that  they  neglect  the  case  covered  by  Lemma  3.32.  They  claim  the 
following  result. 

Define  the  program  V  to  satisfy  “Property  0”  if 

1.  There  is  a  partial  mapping  from  into  p(2)  that  preserves  the  distinguished  variables 
in  p(i). 

2.  There  are  two  distinct  nondistinguished  variables  .4  and  C  that  appear  only  in  pjj) 
and  p(2),  in  the  following  positions: 

?>(!)(  A  C  )  P(2)(  C  A  ) 


3.  T*  is  not  minimal. 

The  result  of  [40]  is  the  claim  that  the  complete  tree  Te  of  depth  2  is  not  contained  in 
any  right-linear  expansion.  The  following  program  satisfies  “Property  0,”  and  yet  Te  is 
contained  in  the  depth-1,  right-linear  expansion  T3. 

ri  :  p{X,Y,W,Z)  p{U,X.  AX')\piX,X,C..A),e{YJV,  Z). 

1-2  :  p(X,Y,W,Z)  ■-  biX,Y.W,Z). 

The  weaker  result  of  Lemma  3.33  suffices  for  our  purposes. 

The  case  /(p(i))  =  P(i)P(2)  L®t  us  return  to  the  containment  of  the  minimal  violation  / 1 
of  right-linearity  (see  Figure  3.28)  in  a  right-linear  expansion  Ts  of  depth  at  least  2.  Let  / 
be  a  containment  mapping  from  T5  into  T\  such  that  /(p(i))  =  P(i)P(2)  (see  Figure  3.2s ). 
VVe  will  show  that  if  P  is  basis-liuearizable.  then  P  is  minimal.  As  before,  if  p(i)  and  p())  ai'- 
not  connected,  then  V  is  minimal  by  Lemma  .3.1  1.  .Vssume  that  p(i)  and  P(2)  are  conned  <’d. 


'The  p.aper  has  been  published  in  TODS  ([41])  witliDiii  a  proof. 
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Figure  3.28:  The  case  /(p(i))  =  P(i)P(2)- 

Lemma  3.34  For  all  j,  if  P(2)[j]  is  nondistinguished,  then  so  is  P(i)[j]. 

Corollary.  If  P(i)[7]  is  a  distinguished  variable,  then  P(2)b]  is  distinguished. 

Proof.  The  proof  follows  from  the  fact  that  /  preserves  distinguished  variables.  If  P(2)[j] 
is  nondistinguished,  then  P(i)P{2)[j]  would  also  be  nondistinguished;  hence,  by  Lemma  3.7 
and  the  assumed  mapping  /(p(i))  =  P{i)P(2)i  P(i)[i]  ititist  be  nondistinguished.  □ 


Lemma  3.35  Assume  that  the  arguments  of  the  p-atoms  in  V  are  of  the  following  form. 

P(i)(  b  )  P(2)(  B  C  ) 


Then  f{B)  is  Xi.D  or  C. 

Proof.  The  mapping  /(p(i))  =  P(i)P(2)  yields  the  fact  that  P(i)P(2)[^']  is  A’j.  The  kth 
arguments  of  P(i)P(i)  and  p(2)  ^'I’e  D  and  C  respectively.  The  kth  argument  of  P(2)P(i)  is  B: 
our  result  follows  by  an  e.xamination  of  all  possible  destinations  for  this  atom.  □ 

Recall  that  a  Class  2  variable  is  a  distinguished  variable  that  appears  only  in  p(2), 
such  that  p(2)[i]  7^  A'j.  Recall  also  that  a  Class  2  position  is  an  argument  position  in  p(2) 
that  is  occupied  by  a  Class  2  variable. 

Lemma  3.36  If  k  is  a  Class  2  position,  then  A'yt  is  a.  Class  2  variable. 

Proof.  The  i)icture  is  as  follows,  where  C  ^  Xj,  and  where  Xj  appears  only  in  p(2). 

P(2)(  C  Xj  ) 


By  safety,  Xk  appears  in  the  rule  body.  By  Lemma  3.1b.  since  does  not  appear  in  p(i), 
A^-  doe.s  not  appear  in  any  Assume  that  A\-  apj)ears  in  some  position  /  in  Then. 
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Xj  appears  in  position  I  in  P(2)P{i)^  s-nd  this  atom  is  a  leaf  in  T5.  Consider  the  possible 
destinations  of  this  leaf.  Now,  Xj  appears  nowhere  in  and  hence  appears  nowhere  in 
P{i)P{i)  or  P(i)P(2);  hence,  f{p^2)P(\))  =  P(2)[l]  =  A',-  The  picture  is 

/  j  k  i 

P(i)(  A'*  )  p(2)(  C  Xj  Xj  ) 

The  assumed  mapping  /(p(i))  =  P[\)P(2)  requires  that  the  /th  position  of  P(i)P(2)  is  A'/.-;  that 
is,  P(i)[i]  is  Afc.  However,  in  this  case,  P(2)P(i)[j]  is  A'j,  and  the  mapping  f(P(2)P{\))  =  P(2) 
forces  C  =  -Yj,  contradicting  our  assumption  that  A'j  is  Class  *2. 

□ 


Lemma  3.37  i  is  a  Class  2  position  iff  A'/  is  a  Class  2  variable. 

Proof.  By  Lemma  3.36  and  pigeonholing,  as  in  the  proof  of  Lemma  3.24.  □ 


Lemma  3.38  V  is  cyclic. 

Proof.  By  Lemmas  3.36  and  3.37.  □ 


Lemma  3.39  Assume  the  picture 

7^(i)(  ^  C  )  7>(2)(  A/c  A  ) 

where  A  and  C  are  nondistinguished  and  A';.  ^  A^,  Then  /  cannot  exist. 

Proof.  By  safety,  A'^  must  appear  in  the  rule  body.  By  Lemma  3.16,  it  cannot  appear  in 
any  €g.  Assume  it  appears  in  p(i),  in  position  /  say.  The  mapping  /(p(i))  =  P(i)P(2)  requires 
that  P{i)P(2)[^]  is  A",-;  hence,  P(2)[^]  is  some  distinguished  variable  A'^^  7^  such  that  p(2)[^^^] 
is  A'^  Now,  the  /th  argument  of  the  leaf  P(2)7^(i)  in  To  is  Xk,  but  AT  does  not  appear  in 
popsition  /  in  any  leaf  in  Ti;  that  is,  the  /th  arguments  of  the  leaves  in  Ti  are  as  below. 

l  i 

P(1)P(1)(  ^  )  P(l)P(2)i  A,  )  P(2)(  A',„  ) 

Hence,  A'j  appears  only  in  p(2),  and  is  tlui.s  Class  2.  By  Lemma  3. -37,  Ic  must  be  a  Class 
2  position,  a  contradiction  since  .4  is  nondistinguished.  □ 


Lemma  3.40  Assume  the  picture 

i  J  j 

P(i)(  ^  O'  )  P(2)(  B  ) 

where  .4  and  B  are  connected  nondistinguisliod  variables.  If  V  is  not  minimal,  then  C  ^  B. 
A  =  B  and  /(A)  is  Xj  or  C. 

Proof.  By  Lemma  3.34,  C  is  nondistinguisliod.  If  C  =  B  then  V  is  minimal  by  Lemma 
3.33.  Assume  that  C  ^  B.  By  safety,  Xj  apjx'ars  in  p^^y  p(2)Or  some  e,. 
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If  Xj  appears  in  some  e,,  then  our  result  follows  by  Lemma  3.16  and  connectivity 
(Lemma  3.13).  Assume  that  Xj  appears  in  no  e,.  By  Lemma  3.37,  A',  cannot  appear  only 
in  p(2);  hence,  Xj  appears  in  in  position  k  say. 

The  assumption  f(P(i))  =  P(i)P(2)  requires  that  P(2)[^']  be  some  distinguished  variable 
Xi  different  from  Xj  such  that  P(i)[/]  is  Xj.  Since  A  and  C  are  nondistinguished,  k  7^  i  and 
k  j.  The  situation  is 

P(i)(  A  C  Xj  A'j  )  P(2)(  B  Xi  ) 

By  Lemma  3.35,  f{B)  is  one  of  A'j,  C  and  A'/.  Since  we  assumed  C  7^  A,  by  connectivity 
we  must  have  A  =  B.  We  will  show  that  /(.4)  7^  Xi  to  complete  the  proof. 

For  the  assumed  mapping  on  p(i),  P(2)[*]  niust  be  A^  for  some  m  (m  7^  /,7n  7^  j)  such 
that  P(i)[m]  is  A'/.  The  new  picture  is 

i  k  I  m  i  j  k^ 

P(l){  A  Xj  Xj  A/  )  P(2)(  Am  A  Xl  ) 

An  examination  of  destinations  for  P(2)P(i)  suffices  to  show  that  we  must  have  f{P{2)P{i))  — 
P(2).  This  mapping  yields  f{A')  =  A;  however,  in  this  case,  neither  P(2)P(2)  nor  P(2)P{2)P(\) 
has  a  destination  among  the  leaves  of  Tj .  □ 


Lemma  3.41  Assume  the  picture 

P(1)(  A  C  )  P(2)(  -4  ) 

If  f{A)  =  C,  then  V  is  minimal. 

Proof.  Assume  /(  A)  =  C.  If  C  =  A,  then  V  is  minimal  by  Lemma  3.33.  Assume  C  7^  .4; 
we  will  show  that  /  cannot  exist.  The  mapping  /{p(i))  =  P(2))  along  with  Lemma  3.34, 
requires  that  C  is  nondistinguished.  The  same  mapping  and  the  assumption  /(.4)  =  C 
requires  that  P(2)[i]  be  some  distinguished  variable  A'/.-  such  that  P(i)[^’]  is  C.  Since  .4  7^  C. 
we  conclude  that  Xk  ^  Xi.  The  new  picture  is 

P(i)(  A  C  C  }  P(2)(  .\k  A  ) 

Now,  the  mapping  /(p(i))  =  P(i)P(2)  yields  f(C)  =  .4';  lionce,  P{2)[A.‘]  =  A.  By  Lemma  3.39. 
/  cannot  exist.  □ 


Lemma  3.42  Assume  that  the  arguments  of  p(,)  and  /q  ,)  are  as  follows,  where  A  and  C 
are  nondistinguished  (hence  i  7^  j). 

P(i)(  A  C  Xj  )  .V,.  .4  ) 
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Let  Ts  be  a  right-linear  tree  of  depth  r  >  1.  Then,  the  ^’th  argument  of  the  T5 

(i.e.  of  P(2f)  is  some  A'/  7^  Xj. 

Proof.  Consider  the  set  S  =  {ki, . . . ,  /?„}  of  all  positions  in  p^i)  that  are  occupied  by  Xj  '.  by 
assumption,  this  set  has  cardinality  at  least  1.  Since  p(^i)[i]  and  p(i)[j]  are  nondistinguished, 
kg  7^  i  and  kg  7^  j  for  all  q.  The  requirement  f{P{i))  -  P{i)P{2)-  along  with  the  fact  that  / 
is  the  identity  on  distinguished  variables,  shows  that  for  each  /  €  S,  P(2)[/]  is  A'm  for  some 
m  £  S.  A  straightforward  induction  on  1  <  q  <  ?•  shows  that  for  any  /  €  5,  there  is  an 
777.  €  S  such  that  the  fth  argument  of  p(2)’  is  Xm-  Our  result  follows.  □ 

Lemma  3.43  Assume  that  the  arguments  of  p(i)  and  p^2)  are  as  follows,  where  .4  and  C 
are  nondistinguished  and  C  ^  A.  Then  f{A)  ^  Xj. 

i  j 

P(l)(  A  )  /7(2)(  B  ) 

Proof.  If  /(p(i))  =  P(\)P{2)  and  f{A)  =  Xj,  then  we  have  the  following  picture  where 
k  7^  i,  k  7^  j. 

P(i)(  A  C  Xj  )  p(2)(  Xk  .4  ) 

The  kth.  argument  of  the  leaf  P(i)P(i)  in  Ti  is  the  nondistinguished  variable  C.  Since  we 
assume  /(p(i))  =  P(i)p(2))  the  A:th  argument  of  P(\)P(2)  is  A',  .  The  ith  and  jth  arguments 
of  the  p-leaves  in  T\  are  as  follows. 

i  j  i  j  i  J 

P(i)P(i)(  A'  C  )  P(i)P(2)(  Aj  -A'  )  p(2)(  Afc  .4  ) 

Let  Ts  have  depth  ti  +  1,  for  some  n  >  1.  Consider  the  lowest  p-leaves  in  Ts;  that 
is,  the  atoms  P(2)"P(i)  and  P(2)"‘'''-  By  Lemma  3.42,  the  A:th  argument  of  P(2)'‘'^^  is  some 
distinguished  variable  distinct  from  Xj,  and  hence  we  must  have  /(P(2)"'''M  =  P(2)-  Now. 
the  7th  argument  of  P(2)"P(i)  and  the  jth  argument  of  P(2)”'^^  are  both  .4"  (recall  that  thiN 
is  the  variable  A,  primed  n  times).  The  mapping  f(P{2)'^'^^)  =  P{2)  foi'ces  /(.4l"l)  =  .1: 
however,  A  does  not  appear  in  the  ith  position  of  any  p-leaf  in  Tj.  and  hence  77(2)”p(i)  ha.', 
no  destination  in  Ti.  □ 


Lemma  3.44  V  is  minimal. 

Proof.  By  Lemmas  3.40,  3.41  and  3.43.  □ 

The  case  f{p(\))  =  P(2)  Assume  that  P  is  basis-linearizable,  and  that  the  minimal  vin 
lation  Ti  of  Figure  3.29  is  contained  in  a  right-linear  tree  T5  of  depth  at  least  2.  .4.ssniiii 
further  that  the  containment  mapping  /  from  into  Ti  satisfies  /(p(i))  =  p(2)  (see  Figuo 
3.29).  We  will  show  that  V  is  minimal  or  p^^)  is  an  adjunct  to  p(2).  to  complete  the  pronl. 

Lemma  3.45  For  all  i  and  j,  if  P(i)[i]  is  tlx-  disi ingnished  variable  .V,.  then  77(2)[j]  is  .V.. 
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Figure  3.29:  The  case  /(p(i))  =  7^(2)- 

Proof.  Since  /  is  a  containment  mapping  from  T5  into  T\,  and  since  both  expansions  have 
the  root  p(A'i, . . A^),  we  conclude  that  /(A,)  =  A,-  for  all  /.  Our  result  follows  because 
f{p^-l))  =  p(2)  by  assumption.  □ 

Recall  that  a  Class  2  variable  is  a  distinguished  variable  Xj  that  appears  only  in  P(2). 
such  that  p(2)[i]  Xj.  Recall  also  that  a  Class  2  position  is  a  position  in  p(2)  that  is 
occupied  by  a  Class  2  variable. 

Lemma  3.46  If  k  is  a  Class  2  position,  then  Xk  is  a  Class  2  variable. 

Proof.  Assume  the  following  scenario,  where  C  ^  Xj  (so  j  k),  and  where  Xj  appears 

only  in  p^2)- 

P(2)(  C  Xj  ) 

By  safety.  A'*.,  appears  in  the  rule  body.  By  Lemma  3.16.  it  cannot  appear  in  any  e,. 
.Assume  it  appears  in  P(i),  in  position  /;  the  mapping  /(?^(i))  =  p(2)  p(2)[/]  =  Xk  (so 

I  zfi  jj  ^  k).  The  picture  is 

P(l)(  Afc  )  P(2)(  C  -Xj  Xk  ) 

Now,  Xj  appears  in  the  Ith  position  of  the  leaf  P(2)P(i)  T5.  but  appeals  nowhere  in  the 

leaves  P[i)P{i)  or  P(i)P(2)  in  Ti  (since  Xj  does  not  ap])ear  in  p(i));  also,  the  /th  argument  of 
P(2)  is  Xk  Xj.  so  P{2)P{\)  bas  no  legal  destination  in  Tj.  Thus,  Xk  appears  only  in  p^2)- 
□ 


Lemma  3.47  For  all  i,  i  is  a  Class  2  position  iff  .V,  is  a  (.‘lass  2  variable. 
Corollary.  V  is  cyclic. 
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Proof.  The  proof  follows  by  Lemma  3.46  and  pigeonholing.  □ 


Lemma  3.48  Assume  that  A  is  a  nondistinguished  variable  appearing  in  P(i).  Then  /(.-I)  # 

.Y,-. 

Proof  Assume  the  converse.  By  the  assumed  mapping  /(p(i))  =  P(2),  the  picture  is  the 
following. 

P(i){  A  C  )  P(2)(  Xj  A  ) 

We  show  that  T5  cannot  have  depth  2  or  greater,  a  contradiction  since  T5  is  assumed  to 
have  depth  at  least  2. 

Since  A  is  nondistinguished,  i  ^  j.  By  Lemma  3.45,  C  must  be  nondistinguished.  By- 
safety,  A',-  must  appear  in  the  rule  body. 

If  A'i  appears  in  any  e,,  then  by  Lemma  3.16,  fiXj)  is  A  or  Aj,  a  contradiction,  .-\ssume 
that  A,  appears  in  p(i),  in  position  k  say.  Then,  by  Lemma  3.45,  P(2)[^]  is  A',.  The  picture 
is  '  k 

P(i)(  A  C  Xi  )  7>(2)(  Xj  A  Xi  ) 

Now,  since  T5  has  depth  at  least  2,  the  leaf  P(2)P(i)  a^ppears  in  it.  The  fcth  argument  of  this 
leaf  is  Xj.  However,  Xj  does  not  appear  in  the  kth  position  in  any  leaf  in  Ti ,  a  contradiction. 

Hence,  A',  appears  only  in  p(2).  However,  then  A,  is  a  Class  2  variable,  and  we  conclnde 
by  Lemma.  3.47  that  P(2)(?1  (that  is,  Xj)  is  a  Class  2  variable.  However,  this  result  contradicts 
Lemma  3.47  because  P(2)[i]  is  a  nondistinguished  variable,  and  is  hence  not  Class  2.  □ 


Lemma  3.49  p(i)  is  an  adjunct  to  p(2)  or  V  is  minimal. 

Proof  We  will  show  the  following. 

1.  If  a  distinguished  variable  Xj  appears  in  the  ith  position  in  p(i),  then  P(2)[']  is  -V,. 

2.  If  a  nondistinguished  variable  A  appears  in  p(i)  (in  the  tth  position,  say)  and  in  some 
e,,  then  P(2)[i]  is  A. 

3.  If  a  nondistinguished  variable  appears  in  p(i)  and  p(2),  then  V  is  minimal. 

Hence,  MV  is  not  minimal,  then  every  nonlocal  variable  appearing  in  the  arguments  of  p(,) 
appears  in  the  same  position  in  p(2);  that  is.  2>(i)  is  an  adjunct  to  p(2). 

To  prove  ( 1 ),  we  observe  that  by  Lemma  3.45,  every  distinguished  variable  in  p(  1 )  apix'ai  s 
in  the  same  position  in  p(2). 

To  prove  (2),  we  proceed  as  follows.  Assume  that  p(i)  shares  a  nondistinguished  variabk 
.4  with  an  EDB  subgoal  e,,  and  that  A  is  the  /th  argument  of  pj^j.  By  Lemma  3.15.  /( .4 ) 

.4  or  .4'.  However,  no  primed  variable  appears  in  p(i).  so  the  mapping  f[p(\))  =  P(2)  forces 
P(2)[i]  =  -4. 

Finally,  we  prove  (3).  Assume,  now.  that  a  nondistinguished  variable  4  appears  in 
position  i  in  p(]),  and  in  position  j  in  p^■2)■  ^^0  show  that  V  is  minimal.  By  Lemma  3.33.  il 
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i  =  j  then  V  is  minimal.  Assume  i  ^  j,  and  let  the  jt\\  argument  of  be  C.  The  picture 
is  ^  . 

P(l)(  ^  ^  ^  ) 

By  Lemma  3.45,  since  P{2)[j]  is  nondistinguished,  C  must  be  nondistinguished.  Further,  by 
Lemma  3.33,  if  P(2)[?]  is  nondistinguished  then  V  is  minimal.  Assume  that  P{2)[^]  is  some 
distinguished  variable  X^',  the  mapping  /(p(i))  =  P(2)  forces  f{A)  =  Xk-  The  picture  is 

P(i)(  ^  )  Pmi  A  ) 

By  safety,  Xj  appears  in  the  rule  body.  Assume  that  Aj  appears  in  some  e, ;  by  Lemma 
•3.17,  we  must  have  f{A)  =  Xj  or  f{A)  =  C.  Since  C  is  nondistinguished,  the  former  must 
hold,  so  that  Xk  =  Xj,  contradicting  Lemma  3.48.  By  Lemma  3.47,  Xj  cannot  appear 
only  in  p^2)-  since  in  this  case  Xj  would  be  Class  2  but  P(2)[i]  =  A  is  nondistinguished. 
Thus,  Xj  must  appear  in  p(i),  in  position  m  say.  Then,  the  mapping  /(?>(i))  =  p(2)  forces 
P(2)[m]  =  Xj.  The  picture  is 

i  j  m  i  j  ^3 

P(i)(  A  C  Xj  )  P(2)(  Xk  A  Aj  ) 

Now,  the  leaf  P(2)P(i)  must  appear  in  Ts,  and  the  mth  argument  of  this  leaf  is  A.  The 
mth  arguments  of  the  leaves  P(i)P(i)  mrd  ?.>(i)P(2)iu  Ti  are  each  C  Xk,  so  we  must  have 
./■(P(2)P(i))  =  P(2)-  Thus,  P(2)['m]  must  be  Xk,  so  AT-  =  Xj,  contradicting  Lemma  3.48.  □ 

The  proof  is  now  complete. 


Chapter  4 

Undecidability  of  the  general 
problems 


4.1  Introduction 

Finally,  let  us  turn  to  the  decidability  of  basis-linearizability  and  sequencability.  In  this 
chapter,  we  will  show  that  both  these  problems  are  undecidable  for  a  restricted  class  of 
Datalog  programs. 

4.1.1  Definitions 

Consider  the  following  single-IDB,  safe,  Datalog  program  T,  in  which  the  head  of  every  rule 
is  rectified  (i.e.,  contains  no  repetitions  of  any  variable).  Recall  that  a  program  is  single-IDB 
iff  there  is  only  one  intensional  predicate  in  V. 

Let  V  consist  of  the  n  recursive  rules 

7-1  :  p(.Yo)  p{Xu),-.-p{XikM-i- 

ri  :  p{Xo)  PiX'n), . .  .p{Xik,  ),Ci- 

rn  ■  P{Xq)  p{Xn\).-- -PiXnkJX'.n. 
and  the  m  nonrecursive  rules 
^>1  •  P(Xo)  Vi. 

bj  :  p{Xq)  Vj. 

bm  •  7^(-^o) 

where  the  CiS  and  PjS  are  arbitrary  coiiJuim  KDB  predicates. 
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Figure  4.1:  Right-linear  query. 


Base-case  linearizability 

Recall  from  Section  1.4.3  that  a  conjunctive  query  generated  by  V  is  right-linear  il  only  the 
rightmost  occurrence  of  p  is  ever  recursively  expanded  (see  Figure  4.1).  Recall  also  that  V 
is  basis- linearizable  ifi  every  conjunctive  query  generated  by  P  is  contained  in  a.  right-lineai 
conjunctive  query. 

Sequencability 

Recall  from  Section  1.4.4  that  a  conjunctive  query  generated  by  V  is  termed  sequenced  \I 
ri  is  never  used  to  expand  a  p-atom  introduced  by  Vj  in  a  top-down  expansion  generating 
this  conjunctive  query,  if  i  <  j  (see  Figure  4.2).  Recall  also  that  P  is  termed  sequencable  iff 
every  conjunctive  query  generated  by  P  is  contained  in  a  sequenced  query  generated  by  P. 

4.2  Results 

We  state  below  the  main  results  of  this  chapter.  In  the  following  statements.  P  and  Q  are 
safe.  Datalog  programs  defining  a  single  intensional  |)re(licate.  using  head-rectified  rules. 

Result  4.1  P  C  Q  (P  =  Q)  is  undecidable,  even  if 

(a)  P  and  Q  are  linear,  and  have  no  more  than  five  basis  rules;  or 

(b)  Each  of  P  and  Q  contains  only  one  recursive  rule  and  nine  nonrecursive  rules. 

□ 

Result  4.2  The  base-case  linearizability  of  P  is  uiHl<“<  i(iablr.  even  if  P  contains  only  one 
nonlinear  rule  and  five  basis  rules.  □ 


4.2.  RESULTS 


f 
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Figure  4.2:  Sequenced 
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Result  4.3  The  sequencability  of  V  is  undecidable,  even  if  V  contains  only  two  recursive 
rules  and  nine  basis  rules.  □ 

4.2.1  Related  results 

Shmueli  ([33])  and  Abiteboul  ([1])  present  general  results  that  also  yield  undecidability 
results  for  program  equivalence.  Shmueli  considers  programs  with  a  single  recursive  predi¬ 
cate;  however,  these  programs  have  several  IDB  predicates,  and  include  rules  whose  heads 
are  not  rectified.  The  assumption  of  head-rectification  is  integral  to  the  proof  of  Theorem 
3.1.  Abiteboul’s  result  concerns  singie-IDB  programs  with  a  single  recursive  rule;  however, 
those  programs  have  rules  that  are  not  head-rectified,  and  contain  an  unbounded  number 
of  initialisation  rules. 

4.3  Outline 

Our  undecidability  results  involve  reductions  from  undecidable  problems  for. context-free 
grammars.  In  Section  4.4,  we  will  define  certain  normal  forms  for  context-free  grammars., 
and  show  that  the  containment  of  such  normal-form  grammars  is  undecidable.  In  this 
section,  we  will  also  show  how  unsafe  Datalog  programs  may  be  made  safe,  allowing  the 
reductions  of  the  subsequent  sections  to  involve  unsafe  programs. 

In  Section  4.5,  we  will  prove  Results  4.1(a)  and  4.2.  Section  4.6  contains  the  proof  of 
Results  4.1(b)  and  4.3. 

4.4  Preliminaries 

In  this  section,  we  present  results  that  will  be  of  use  in  the  proofs  of  the  next  two  sections. 

4.4.1  Context-free  grammars 

The  results  of  the  following  sections  will  be  based  upon  reductions  from  undecidable  prob¬ 
lems  in  language  theory.  In  this  subsection,  we  establish  some  preliminary  results.  Our 
treatment  is  concise,  and  assumes  concepts  that  are  explained  in  [15]. 

Definition  4.1  For  any  grammar  G',  a  production  .4  —  l3  is  termed  intensional  if  at  least 
one  nonterminal  appears  in  /3,  and  extensionul  otherwise.  □ 

Definition  4.2  .4  grammar  G  over  the  alphabet  S  =  {a,b}  is  termed  bounded-basis  il  G  is 
f-free  and  if  there  are  only  two  extensional  productions  in  G',  which  are  of  the  form  A'u  —  « 
and  Nh  —  b  where  Na  i=-  N^^  and  where  neither  nor  A’j,  appears  as  the  head  of  any  other 
production.  Hence,  we  may  partition  the  nonterminals  into  intensional  and  extensional 
nonterminals,  depending  on  whether  they  appear  on  tlie  left-hand  side  of  a  intensional  or 
extensional  production.  Na  is  termed  the  extensional  nonterminal  representing  a,  and  .V(, 
is  termed  the  extensional  nonterminal  representing  b.  □ 
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Lemma  4.1  Given  any  linear  grammar  H  over  S  =  we  may  effectively  construct  a 

linear,  bounded-basis  grammar  G  such  that  L{G)  =  L{Il)  —  {c}. 

Proof.  Construct  a  linear,  e-free  grammar  I  for  L{H)  -  {e}  by  determining  nullable  non¬ 
terminals  and  following  the  procedure  of  Theorem  4.3  in  Section  4.4  of  [1.5].  .A.ssume  that 
S  is  the  start  symbol  of  I.  We  construct  G  from  /,  as  follows.  Introduce  new  nonterminals 
Na  and  iVj,,  and  the  productions  Na  —>■  a  and  Nb  -*  b.  Finally,  consider  any  extensional 
production  A  a  If  a  is  a,  replace  the  production  by  the  unit  production  .4  —  Na',  if  ft 
is  6,  replace  the  production  by  A  Nb.  Otherwise,  a  is  of  the  form  ,da  or  3b.  In  the  first 
case,  replace  the  production  by  the  production  A  3^0,  and  in  the  second,  replace  the 
production  by  A  —>■  3Nb.  O 

Definition  4.3  A  Modified  Chomsky  Normal  Form  (MCNF)  grammar  G  over  the  terminal 
alphabet  E  =  {a,  6}  is  a  grammar  with  the  following  properties. 

1.  G  is  bounded-basis. 

2.  Each  intensional  nonterminal  N  appears  at  the  head  of  at  most  two  productions. 
Further,  if  N  does  appear  as  the  head  of  two  productions,  then  both  productions  are 
unit  productions.  That  is,  intensional  productions  are  of  three  types. 

(a.)  If  A  — *•  M  and  N  K  are  productions,  then  these  productions  are  or- product  ions. 

(b)  N  —>■  MK  is  an  and-production. 

(c)  If  N  appears  only  on  the  left-hand  side  of  the  production  N  —  .M ,  then  tliis 
production  is  a  copy  production. 

a 


Lemma  4.2  For  every  grammar  H  over  S  =  {a,  6},  there  is  an  MCNF  grammar  G  gener¬ 
ating  L{H)  -  {e}. 

Proof.  Construct  a  Chomsky  Normal  Form  grammar  for  L(H)  —  e,  with  start  symbol  S. 
Introduce  the  nonterminals  Na  and  Nb,  and  the  productions  Na  —  a  and  Nb  —  b.  Then, 
replace  every  other  production  of  the  form  N  c,  where  c  is  a  terminal,  by  the  prodin  t  ion 
N  Nc.  All  productions  other  than  .Na  —  «  and  Nb  —  b  are  now  intensional.  Replace 
every  and-production  N  -*  MK  by  the  two  productions  N  L  and  L  —  MK,  where 
L  is  a  new  nonterminal.  At  this  point,  the  only  violations  of  MCNF  are  the  presence  of 
nonterminals  N  such  that  N  ^  Ri,...,N  Rk+i,  for  k  >  1.  are  the  productions  with  .V 
on  the  left-hand  side.  Introduce  new  nonterminals  Mi, . . . ,  Ah',  then,  add  the  productions 
N  Ml  and  Mk  —  Rk+\',  finally,  for  2  <  /  <  k.  replace  the  production  .N  —  iZ,  with  the 
two  productions  M,_i  — ^  iZ,  and  Mi-i  4/,.  □ 

Lemma  4.3  Assume  S  =  {a,b}.  It  is  undecidable,  for  an  arbitrary  context-free  gramniar 
(CFG)  G  over  the  alphabet  E,  whether  E*  C  l.(G).  This  result  is  true  even  if  G  is  linear. 


a  is  a  string  of  terminals. 
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Proof.  Let  S  =  {fli,.  the  corresponding  problems  over  this  alphabet  are  known  to 

undecidable  ([15]).  The  size  of  the  alphabet  can  be  reduced  to  2  by  padding  and  encoding. 
□ 


Lemma  4.4  Let  Gi  and  G2  be  bounded-basis  grammars  over  the  alphabet  S  =  {«,6}. 
Then  G'l  C  62  is  undecidable.  This  result  is  true  even  if  6'i  and  G2  are  required  to  be 
linear,  or  if  both  grammars  are  required  to  be  in  Modified  Chomsky  Normal  Form. 

Proof  Let  Gi  be  the  obvious  bounded-basis,  linear  (or  MCNF)  grammar  generating  S+. 
Testing  whether  e  £  L{G2)  is  decidable,  and  our  result  follows  by  Lemmas  4.1  and  4.3  (4.2 
and  4.3  if  G'2  is  MCNF).  □ 

Lemma  4.5  Let  G  be  a  linear,  bounded-basis  grammar  over  L  =  {a,b}.  Then,  S+L(G)  C 
Si/(G')  is  undecidable. 

Proof.  S"*"  C  L{G)  iff 

1.  1.  E  C  L{G)  and 

2.  2.  S+L(G)  C  SX(G). 

Since  1  is  decidable,  our  result  then  follows  by  Lemma  4.4. 

□ 

4.4.2  Datalog  programs 

The  following  lemma  will  be  of  use  in  the  following  sections. 

Lemma  4.6  Let  C  and  D  be  the  conjunctive  queries 

C  :  p(A')  C. 

D:  p{X):-V,f{Z). 

where  C  and  V  are  conjunctions  of  EDB  predicates.  Z  is  a  distinguished  variable  (i.e.  Z 
appears  in  .V)  and  f{Z)  does  not  appear  in  C.  Then  C  (f  D. 

Proof.  Every  containment  mapping  g  :  D  C  must  satisfy  (j(Z)  =  Z,  since  (j{p(X))  must 
be  p{X).  However,  then  g(f(Z))  does  not  appear  in  the  body  of  G'.  □ 

Safety 

Recall  that  a  rule  is  termed  safe  if  every  variable  appearing  in  the  rule  head  (a  distinguished 
variable)  appears  in  the  rule  body,  and  a  program  is  termed  safe  if  every  rule  in  the  program 
is  safe.  The  constructions  of  the  following  sections  will  deal  with  programs  that  are  unsafe. 
However,  these  programs  may  be  made  safe  without  altering  the  results  of  these  sections, 
as  follows. 

Let 
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be  a  (not  necessarily  safe)  conjunctive  query  (or  rule),  and  let  e  be  a  predicate  not  appearing 
in  r.  Then  the  notation  safe{r,e)  represents  the  query  obtained  from  r  by  adding  conjuncts 
e( -4)  to  the  body  of  r  for  every  variable  A  that  appears  in  r.  Similarly,  the  notation  safe{  V.  c ) 
represents  the  replacement  of  every  rule  r  in  the  program  V  with  the  rule  safe{r,e)  for  some 
predicate  e  that  appears  nowhere  in  V. 

Example  4.1  If  r  is  the  rule 

r:  p{X,Y)  b{X,U). 

then  safe{r,e)  is  the  rule 

p(A',y)  b{X,U),e{X),e{y).<U).  □ 

Lemma  4.7  For  any  conjunctive  queries  r  and  s  and  any  predicate  e  that  appears  nowhere 
in  r  or  s,  there  is  a  containment  mapping  /  :  s  ^  r  iff  there  is  a  containment  mapping 
g  :  saf€{s,e)  safe{r,e). 

Proof.  The  containment  mapping  g  :  safe(s.c)  saje{r,e)  is  a  containment  mapping  from 
s  into  r.  For  the  converse,  assume  that  /  is  a  containment  mapping  from  s  into  r.  Consider 
any  atom  e{A)  in  safe{s,ey.  by  construction,  .4  appears  in  s,  and  therefore  /(.4)  ap|)ears 
in  r.  Also  by  construction,  e{f{A))  appears  in  .sfl/e(r,e),  and  /  is  therefore  a  containment 
mapping  from  safe{s,e)  into  safe{r,e).  □ 


Lemma  4.8  Let  Vhea.  program,  and  let  e  be  a  predicate  not  appearing  in  P.  Then.  V  gen¬ 
erates  the  top-down  expansion  r  iff  safe{V,e)  generates  the  top-down  expansion  .sn/cf/  .c ). 
Corollary.  Let  V  and  Q  be  programs,  and  assume  that  e  does  not  appear  in  V  or  Q.  Then. 
■P  C  Q  iff  safe(V,e)  C  safe{Q,e). 

Proof  Straightforward  induction  on  the  number  of  rule  applications  in  V.  The  corolla i  \ 
follows  by  Lemma  4.7  and  the  theorem  of  Sagiv  and  Yannakakis  (Theorem  1.2).  □ 

For  the  remainder  of  this  chapter,  we  will  consider  unsafe  programs  with  the  nndi'i 
standing  that  these  programs  will  be  made  safe  as  in  the  above  lemma^. 


4.5  Linear  Logic  Programs 

In  this  section,  we  prove  Result  4.1(a)  and  Result  4.2.  The  basic  idea  is  the  simulation  " 
bounded-basis  grammars  using  single-IDD  |)i<)grams  with  head-rectified  rules  and  a  bound'', 
number  of  basis  rules. 


^The  predicate  e  essentially  represents  the  DOM  e  l.iiion  {[tti]) 
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4.5.1  The  construction 

Let  G  be  an  e-free  grammar  over  S  =  {ai, . .  .,ak}  with  nonterminals  {Ni, . . . ,  We 

will  construct  a  program  V,  defining  the  IDB  predicate  p,  to  simulate  G.  The  EDB  for  V 
will  consist  of  binary  predicates  {ai, . . . ,  ajt}  and  a  unary  predicate  /. 

Let  A’,  Y,  W  and  Z  be  new  variable  names;  the  head  of  each  rule  in  V  is 

p{X,Y,W,Z,Ni,...,Nm) 

Note  that  each  rule  head  is  rectified  (i.e.  contains  no  repeated  variables).  Let  <<  A;  >> 
denote  an  /n- vector  in  which  the  fth  component  is  IT,  and  all  other  components  are  Z. 
Further,  let  us  use  the  notation 

p{<  A,B  >«  Ni  >>) 


to  represent  the  ?>atom 

p{A,B,W,Z,«  Ni  ») 

Example  4.2  Let  G  be  a  linear  grammar  over  S  =  {a,  6}  with  start  symbol  S  and  the 
productions 

5  aS  S-^bB  B-^b. 

G  clearly  generates  a*bb.  The  head  of  each  rule  in  V  is  the  atom  p(X.  V'.  IT,  Z,  S.B),  and 
p{<  IKY  >«  B  >>)  denotes  the  atom  7,  IT,  Z,  Z,  IT).  □ 

Definition  4.4  We  define  a  transformation  from  symbols  in  G  (terminal  or  nonterminal) 
to  atomic  formulae,  as  follows.  Let  U  and  V  be  distinct  variables.  For  any  terminal  a,,  the 
{f/,  V]-atom  corresponding  to  a,-  is  the  atom  a,(f/,  T).  The  {U,  T}-atom  corresponding  to  a 
nonterminal  Ni  is  the  atom  p(<  U,V  >«  Ni  >>).  This  transformation  may  be  extended 
to  strings,  as  follows.  Let  s  =  Si . .  .s„  be  a  nonempty  string  of  terminals  and  nonterminals 
in  G.  Assume  that  Ui,...,Un+i  are  distinct  variables.  We  define  the  {U\, . . .  ,Un+\}-(-'b(iin 
corresponding  to  s  to  be  the  conjunction  c  =  Cj, . . . ,  c„  where  for  all  i,  Ci  is  the  {IK,  Ui+\  }- 
atom  corresponding  to  s,.  □ 

Example  4.3  Let  G  be  the  grammar  of  E.xample  4.2.  The  {ALL'j-atom  corresponding  to 
the  terminal  b  is  b{X,U),  and  the  {.Y.  [f,y'}-chain  corresponding  to  the  string  bB  is  the 
conjunction  6(A',t/),p(<  U,Y  >«  B  »),  or  b{X,U).p(U,y,W,Z,Z,\V).  □ 

Definition  4.5  Let  .s  =  ..si . .  .Sn  be  a  (possibly  empty)  string  of  terminals  and  nonterminals, 
let  C  and  F  be  conjunctions  of  atomic  formulae,  and  let  C  be  the  conjunctive  emery  (or  rule) 

C  :  p[H)  V.C. 

Let  D  and  E  be  variables  appearing  in  C.  F  is  termed  a  chain  from  D  to  E.  embedded  in 
C  and  representing  s  iff  one  of  the  following  is  true. 
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1.  5  =  £,  r  is  the  empty  conjunction  true  and  D  =  E. 

2.  5  =  ^=1  (a  terminal  or  nonterminal),  D  and  E  are  distinct  and  F  is  the  {D.E]  atom 
corresponding  to  .Si . 

3.  5  =  .Si . .  with  n  >  I,  D  and  E  are  distinct  and  F  is  the  {D,U\,...,  F'„_i,£}-chain 

corresponding  to  s  for  some  distinct  variables  .,Un-\  distinct  from  D  and 

and  not  appearing  in  H  or  among  the  arguments  of  the  conjuncts  in  C. 

If  s  is  a  nonempty  string  of  terminals  and  D  and  E  appear  in  H,  then  F  is  a  binary 
chain  ([33]).  □ 


Example  4.4  Consider  the  conjunctive  query 

Cl  :  p{X,  Y,  W,  Z)  a{X,  UA  b{  Uu  U2), «( C2,  Y)J{WL  f{Z). 

The  conjunction  a{X,Ui),b(U\,U2),a{U2,Y)  is  a  binary  chain  from  A'  to  Y  representing 
the  terminal  string  aba  and  embedded  in  Cj.  □ 

Binary  chains  may  be  used  to  simulate  strings  in  a  language,  as  follows. 

Lemma  4.9  Consider  the  conjunctive  queries 

C,  :  p{fl)  Fi.Ci. 

C2  :  p{H)  F2,C2. 

where  Fi.F2,Ci  and  C2  are  conjunctions  of  EDB  predicate  occurrences.  Let  A'  and  Y  be 
distinguished  variables  (i.e.  they  appear  in  H).  Let  Fi  be  a  binary  chain  from  X  to  Y. 
embedded  in  Ci  and  representing  the  terminal  string  t  =  and  let  F2  be  a  binary 

chain  from  A'  to  T,  embedded  in  C2  and  representing  the  terminal  string  s  =  si  ....Sn. 
.Assume  that  there  is  a  containment  mapping  </  from 

Pif})  C2. 


into 

P{H):-C,. 

Then,  there  is  a  containment  mapping  h  :  C2  —  Cj  iff  n  =  /  and  the  strings  t\  . .  .ti  and 
Si  . . .  are  identical. 

Proof.  Assume  that  Fi  and  F2  are  as  below. 


Fi  :  /i(A-,f/j),...,f,(t.j_i.r). 
F2  :  .si(A.Vj) . s;( V,i_i , y'). 
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Assume  that  the  strings  are  identical.  Let  h  be  the  identity  mapping  on  distinguished 
variables  (the  variables  in  H),  and  define  li(Vi)  =  Ui  for  all  i.  The  function  /  defined  by 


f{A)  = 


li{A)  if  h  is  defined  on  A 
g{A)  if  is  defined  on  A 


is  a  containment  mapping  from  C'2  into  C'l. 

For  the  converse,  assume  that  h  is  a  containment  mapping  from  C2  into  Ci-  Then 
h{X)  =  X  and  h(Y)  =  T,  since  A'  and  Y  are  distinguished. 

If  n  =  0,  then  r2  =  Si(A',  T).  Since  h  is  the  identity  of  distinguished  variables,  and  since 
does  not  appear  in  Ci  by  assumption,  we  may  conclude  that  r2  =  /i(si(A,  Y))  =  5i(A",  Y ). 

Consider  any  n  >  0.  If  /  =  0  (that  is,  Fi  =  <i(A',  F)),  then  h{si{X,Vi))  must  be 
fi(A', F),  and  h{Vi)  =  F,  a  contradiction  since  F  does  not  appear  as  the  first  argument 
of  any  atom  in  Ti  (so  S2(Fi,V’2)  has  no  destination  in  Fi).  If  /  >  0,  then  the  only  atom 
in  Fi  whose  first  argument  is  A'  is  /i(A",f7i).  Thus,  h{si{X,Vi))  =  ti{X,Ui),  yielding 
Si  =  <1  and  h{Vi)  =  Ui.  An  inductive  repetition  on  /  may  be  used  to  show  that  I  =  n,  and 


Example  4.5  Let  the  conjunctive  queries  C\  and  C2  be  as  follows. 

Cl  :  p(A',F,IF,Z):-  a(A',  Ci),  h(Ci,  1/2),  a(^'2,  F). /( IL'), /(Z). 

C2  :  piX,Y,W,Z)  «(A', Vi),6(Fi, V^),a(V^,F),/(lF)./(Z). 

Then,  the  mapping  h  defined  by  li{X)  =  XJi{Y)  =  Y,h{W)  =  W,h{Z)  =  ZJi{Vi)  = 
Cl,  /!(F2)  =  172  is  a  containment  mapping  from  C2  into  C'l .  However,  there  is  no  containment 
mapping  from  C3  or  C4  into  Ci,  where  C3  and  C4  are  defined  as  follows. 

C3  :  p{X,  F,  W,  Z)  b{X,  Vi),  b(  Vi,  V2),  a(V2,  F),  /(IF),  /(Z). 

C4:  p(A^F,H^Z):-«(A^Fl),a(Fl,F),/(^F),/(Z).  □ 

The  transformation  from  C  to  T*  is  effected  by  the  following  algorithm. 

Algorithm  4.1 

INPUT:  an  e-free  grammar  G  with  nonterminals  A'l.. . .,  A,„. 

OllTPLfT:  a  single-IDB  program  P  to  simulate  G^. 

1.  The  head  of  every  rule  is  p(.V,  F,  IF.  Z,  A/ . A'm  )• 

2.  Consider  every  production 

cl,  :  Nj  /3 

in  G,  where  l3  is  of  length  n.  Let  {f^i _ _  Un-\ }  be  a  set  of  new  and  distinct  variables. 

and  let  7  be  the  {AL  Ci, ....  C„_i,  F}-chain  representing  /?.  We  construct  the  rule 

r.  :  piX,  F,  IF,  Z,  Ai, . . . ,  A,„)  7.  /( )•  /( 'I'). 


^The  maiuier  of  the  simulation  will  be  discussed  later. 
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s 

p(X,Y,W,Z,N,,...,Nr,^) 

N, 

p{A,B,W,Z,«  Ni  ») 

|\ 

(ikNj 

ak{A,U),p{U,B,W,Z,«N, 

Figure  4.3:  Simulating 

a  derivation. 

Add  a  single  basis  rule 

b:  p{X,Y, 

.W,Z,N„...,Nm)  f{Z). 

If  G  is  linear,  then  V  is  linear.  Further,  if  G  has  k  extensional  productions,  then  V  has 
A;  +  1  initialisation  rules. 

Example  4.6  Consider  the  linear  grammar  G  of  Example  4.2.  Our  construction  produces 
the  following  rules. 

p{X.  Y.  VF,  Z,  .S.  B)  «( A',  U).  p{U,  Y,  W,  Z,  W,  Z),  /(5),  f{W). 

p{X.  Y,  VV,  Z,  S,B)  6(A',  F),p(V',  F,  M*,  Z,  Z,  VF),  f{S)J{  W). 

p(A',  y.  IF,  Z,  5.  B)  b{X,  Y),  f(B),f{W).  □ 

The  importance  of  the  variables  W  and  Z  is  that  they  are  persistent',  that  is.  they  appear 

in  their  ‘‘home”  positions  in  every  p-atom  resulting  from  a  top-down  expansion  in  V. 

Lemma  4.10  Let  "P  be  a  single-IDB  program  defining  the  predicate  p,  and  let  .\'i  be  the 
iith  variable  in  the  head  of  every  rule.  Assume  that  the  ith  argument  of  every  p-atom  in  the 
body  of  every  rule  is  A',-.  Then  the  fth  argument  of  every  p-leaf  in  any  top-down  expansion 
of  p  using  the  rules  in  V  is  A';. 

Proof.  By  induction  on  the  number  n  of  rule  applications  in  the  expansion.  □ 

The  intention  of  our  construction  is  that  the  rules  of  V  mimic  derivations  in  G.  to 
produce  binary  chains  to  represent  every  string  in  L{G).  The  idea  is  illustrated  in  Figure  -1.4. 
In  the  figure,  the  variable  A  may  be  a  new  iiondistinguished  variable,  or  the  distinguislied 
variable  A'.  Similarly,  B  may  be  a.  iiondistinguished  variable,  or  the  distinguished  variable 
1'.  U  is  a  new  iiondistinguished  variable. 

Example  4.7  Figure  4.4  shows  how  a  binary  chain  representing  the  string  bb  is  generated 
by  the  program  of  Example  4.4.  □ 

However,  these  rules  also  may  be  used  to  mimic  "illegal”  derivations  in  the  grammar. 
That  is,  the  production  iV,'  — ^  (ikNj  cannot  1h'  used  to  expand  the  nontcnninal  .N/.  il 
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p{X,Y,W,Z,S,B) 


biX,V)  p{V,Y,W,Z,Z,W)  f{S)  f(W) 


b{V\Y)  f{W)  f{W) 


Figure  4.4:  Generating  a  string. 


piX,Y,W,Z,S,A) 


aiX,U)  p{U,Y,W,Z.W,Z)  f{S)  /(W) 
b{U,Y)  f{Z)  f{W) 


Figure  4. -5:  Illegal  expansion. 


Nt  ^  Ni-  However,  the  rule  in  V  that  corresponds  to  the  production  ^  a^Nj  can,  in 
fact,  be  used  to  e.xpand  a  p-atom  resulting  from  the  application  of  the  rule  for  A'„j  ^  aoNt- 
VVe  detect  such  illegal  top-down  e.xpansions  through  the  use  of  the  conjuncts  f{Ni)  in  the 
rules  of  T’.  A  conjunctive  query  resulting  from  an  illegal  expansion  as  described  above  will 
contain  the  atom  /(Z),  and  is  hence  contained  in  the  basis  rule  b.  Figure  4.5  illustrates 
the  attempt  of  V  in  Example  4.6  to  expand  the  rule  representing  the  production  S  —  aS 
through  the  rule  representing  the  production  B  b. 

More  formally,  we  say  that  a  conjunctive  query  (or  top-down  expansion)  generated  by 
V  is  illegalli  its  body  contains  the  conjunct  f{Z).  Further,  let  us  define  legal(P)  to  be  the 
union  of  all  the  conjunctive  queries  generated  by  V  that  are  not  illegal. 

Lemma  4.11  Let  C  be  an  illegal  conjunctive  query  generated  by  V.  Then  C  C  b,  where  b 
is  the  nonrecursive  rule  added  in  step  3  of  the  algorithm. 

Proof.  Both  conjunctive  queries  have  the  root  p(A'.  V',  H'.  Z.  A’l, . . . ,  A'm).  and  the  identity 
mapping  on  variables  in  the  root  is  a  containment  mapping  from  b  into  C.  □ 

The  simulation  of  G  by  V  is  formalised  in  the  ne.xt  three  lemmas. 

Lemma  4.12  Let  S  be  the  legal  top-down  expansion 


5  :  p(.V.r.li’,Z,A'i,. 


.,N,n)  A,  p«  Ua,Vi  X  y  'O).  r./(.V,)./(ll-). 
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where  Ua  and  Vi  are  distinct,  A  is  a  chain  from  A'  to  Ua  representing  some  string  a  in  S. 
r  is  a  chain  from  Vi  to  Y  representing  some  string  7  in  S,  and  the  conjunction 

{A,p{<  Ua,yi  >«  Nj  >>),r) 

is  a  chain  from  A"  to  Y  representing  aNj^/  in  S. 

Construct  the  top-down  expansion  T  by  expanding  the  indicated  p-atom  through  some 
rule  Tk  constructed  by  Algorithm  4.1  from  the  production  dk  :  P.  Then 

1.  If  Nm  7^  Nj,  T  is  illegal;  and 

2.  If  Nm  =  Nj,  then  T  is  of  the  form 

T:  p{X,Y,W,Z,Nu...,Nm)  A,  f(N,),  f{W). 

where  A  is  a  chain  from  A'  to  Y' ,  representing  ap~f  and  embedded  in  T. 

Proof.  By  the  construction  of  the  rule  rk  by  Algorithm  4.1,  r^.  is  of  the  form 
Tk  :  p{X,  Y,  W,  Z,  Ni,...,Nm):-  B.  /( A,  J,  /( fV ). 

where  B  is  a  {A', .Ri, . . ., /2„,y}-chain  corresponding  to  /?  and  the  Ri  are  distinct  variables 
not  appearing  in  the  rule  head. 

If  Nm  7^  Nj,  then  (since  the  A„-argument  of  p{<  Ua,Vi  ><<  Nj  >>)  is  Z),  the  body 
of  T  contains  the  conjunct  f{Z),  and  T  is  therefore  illegal.  However,  if  Nm  =  Nj,  then  by 
the  persistence  of  VV,  the  only  /-atoms  added  are  copies  of  f{W).  Assume  that  the  Rt  arc 
renamed  to  new  variables  Di  in  the  expansion;  then  the  children  of  the  indicated  p-atom  in 
S  forms  a  chain  from  Ua  to  V'l.  representing  p  in  T.  Setting  A  =  (A,jB,r)  completes  (ho 
proof.  □ 


Lemma  4.13  If  is  a  sentential  form  derived  from  the  nonterminal  iV,  by  G,  then 
T  :  p{X,  Y,  W,  Z,  iVi, . . . ,  Nm)  B,  f(  N,)-  fi  H')- 

is  a  top-down  expansion  in  le(jal{V).  where  B  is  a  chain  from  A'  to  Y  representing  S  ami 
embedded  in  T. 

Corollary.  Assume  a,-,  a/j  . . .  €  yield{N,).  Then  generates  the  conjunctive  cim'iy 

C:  piX,Y,W,Z,N, . Nm)  /( A, ). /( IV). 

where  B  is  a  binary  chain  representing  n,,  . .  .«n.. 

Proof.  Assume  A,-  ^  p.  We  prove  our  result  l)y  induction  on  n.  If  7i  =  1,  then  A,  —  i  is  a 
production  and  our  result  follows  by  construction.  .Assume  the  truth  of  the  hypothesis  I'nr 
i  <  n,  where  n  >  1.  .Assume  that  iV;  ^  Y.  then  is  of  the  form  ap'^,  where  Ni  'S-*  rvA  , '  . 
and  where  dk  ;  Nj  —  P  is  a  production.  By  our  inductive  hypothesis,  V  generates  a  legal 
top-down  expansion 
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S  :  p(X,Y,W,Z,Nu...,Nm)  A,  p{<  Ua,Vi  >«  Nj  »),  T, /(Ni).  f{W). 

where  (.4, ]j{Ua,  V'l)  <<  Nj  >>),  F)  is  a  chain  from  A’  to  Y,  embedded  in  S  and  representing 
aNj-y,  A  is  a  chain  from  A'  to  Ua  representing  a  in  5,  and  F  is  a  chain  from  V'l  to  Y 
representing  7  in  S.  We  construct  T  by  expanding  the  indicated  p-atom  through  the  rule 
Tk  corresponding  to  the  production  d/.-,  and  our  result  follows  by  Lemma  4.12.  □ 

Lemma  4.14  Let  T  be  a  top-down  expansion  in  legalfP).  Then  T  is  of  the  form 
T  :  p(  A,  y,  W,  Z,Nu...,Nm)  B,  /(iV.),  /( W). 

where  B  is  a  chain  from  A”  to  Y,  embedded  in  T  and  representing  a  sentential  form  /?  e 
yiel(l{Ni)  for  some  iVj. 

Corollary.  Let  C  be  a  conjunctive  query  in  legalfV).  Then  C  is  of  the  form 
p(A',  y,  W,  Z,  Ai, . . . ,  A'n^)  B,  /(A.),  f{W). 

where  5  is  a  binary  chain  representing  some  string  . .  .a^  €  yiel(l{Ni). 

Proof.  Let  T  be  a  top-down  expansion  in  which  n  rules  are  applied.  We  prove  our  result 
by  induction  on  n. 

If  n  =  1,  then  the  rule  applied  is  of  the  form 
rk  :  p(A^  y,  ly,  Z,  Nu  . . . ,  N^)  B,  /(A.),  f{W). 
where  5  is  a  chain  representing  some  string  /?  and 
dk  :  Ni  13 
is  a  production  in  G. 

Assume  the  truth  of  the  hypothesis  for  i  <  n,  where  n  >  1.  Let  the  following  be  a  legal 
top-down  expansion  generated  by  n  rule  applications. 

T:  p{X.Y,W,Z,Ni,...,N\n)  ■-  A- 

T  must  be  generated  by  the  expansion  of  some  p-atom  in  some  legal  top-down  expansion  5 
(generated  by  n  -  1  rule  applications)  through  some  rule  r^.  By  our  inductive  hypothesis, 
5  is  of  the  form 

5;  p{X,Y,W.Z,Nu...,Nm)  -4,  p(<  Ua,Vi  >«  »),  F, /(A-,), /(W )• 

where  N,  =>  aNj'f,  and  where  the  conjunction 

{A.piUa,V,)  «  Nj  »fr) 

is  a  chain  from  A'  to  Y,  embedded  in  S  and  representing  aNj'f  (so  that  A  is  a  chain  from 
X  to  Ua  representing  o  in  5  and  F  is  a  chain  from  \  \  to  y  representing  7  in  .5').  Consider 
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the  rule  r*  that  is  used  to  expand  the  indicated  p-atom  in  S  to  obtain  T.  By  construction, 
the  rule  is  of  the  form 

,  Tk  :  p(X,  y,  W,  Z,  A'l, . . . ,  iVm)  B,  f(Nm),  f{W).. 

such  that 
dk  :  Nm-* 

is  a  production  in  G.  By  Lemma  4.12  and  the  assumed  legality  of  T,  we  must  have  iV„,  =  Ny, 
hence  Ni  =>  and  our  result  follows  by  Lemma  4.12.  □ 

4.5.2  Using  the  construction 

Let  Gi  be  an  e-free  grammar  over  S  =  {aj,. .  .,0*},  with  start  symbol  Ni  and  intensioual 
nonterminals  Ni, . ..,  Nm-  Let  G2  be  an  c-free  grammar  over  S,  with  start  symbol  Mi  and 
nonterminals  Mi,...,  Mi.  Without  loss  of  generality,  we  assume  that  the  nonterminals  of 
and  G2  are  distinct. 

Let  5  be  a  new  nonterminal.  Construct  an  e-free  grammar  G3  (with  start  symbol  5) 
as  the  union  of  all  the  productions  in  6'i  and  62,  and  add  the  two  productions  si  and  $2 
described  below. 

Si  :  5  ^  Ni 

$2  :  S  -*  Ml 

LiGs)  is  clearly  L{Gi)  U  L{G2). 

Let  G4  be  the  e-free  grammar  obtained  from  G'3  by  deleting  the  production  Si.  1(64) 
is  clearly  the  same  as  ^(^2);  we  have  merely  introduced  a  unit  production  to  change  the 
start  symbol.  Note  that  for  any  Ni  or  Mi,  the  yield  of  this  nonterminal  is  the  same  for  G'3 
and  G4;  that  is,  the  only  nonterminal  for  which  the  yields  may  differ  in  G'3  and  G4  is  the 
start  symbol  S. 

Apply  Algorithm  4.1  to  G3  and  G4  to  create  programs  V  and  Q  respectively,  ensuring 
that  the  rule  head  in  each  case  is  p{X,  1',  W,  Z,  S,  Ni, ....  Nk,  Mi,. . .,  Mt):  that  is,  the  non¬ 
terminals  of  G3  and  G4  appear  in  the  same  positions  in  the  heads  of  all  rules  in  V  and  Q. 
Hence,  every  rule  in  Q  also  appears  in  T,  and  we  may  conclude  that  Q  CV. 

Lemma  4.15  7^  C  Q  iff  L{Gi)  C  L(G'2). 

Corollary.  V  =  Q  \f[  L(Gi)  C  L{G2). 

Proof. 

^  By  the  construction  of  G3  and  G4,  the  strings  generated  by  any  nonterminal  N  ^  S  arc 

the  same  for  G3  and  G4. 

.Assume  that  L{Gi)  C  L{G2)-  Then  LiG'z)  C  L(G4)  by  the  construction  of  G'3  and 
G'4,  and  any  string  generated  by  any  nonterminal  in  G3  is  also  generated  by  the  same 
>  nonterminal  in  G4.  Consider  any  conjunctive  query  C  generated  by  V\  if  C  is  illegal,  then 

C  is  contained  in  the  basis  rule  b  of  Q.  If  C  i.s  legal,  then  by  Lemma  4.14,  assumption. 
Lemma  4.13  and  Lemma  4.9,  C  is  contained  in  some  legal  conjunctive  query  V  generated 
by  Q. 
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For  the  converse,  assume  that  V  C  Q.  Then  every  legal  conjunctive  query  C-p  generated 
by  P  is  contained  in  some  legal  conjunctive  query  Cq  generated  by  Q,  and  Lemmas  4.13. 
4.14  and  4.9  suffice  to  complete  the  proof.  □ 

Theorem  4.1  Let  V  and  Q  be  safe,  linear,  single-IDB  programs  with  head-rectified  rules 
and  five  basis  rules.  Then  V  C  Q  {V  =  Q) 'i?>  undecidable. 

Proof.  Let  G\  and  G2  be  linear  bounded-basis  grammars  over  the  alphabet  E  =  {ai.Ui} 
and  apply  the  construction  of  the  preceding  lemma;  our  result  follows  by  Lemmas  4.15 
and  4.4.  Safety  is  imposed  as  in  Section  4.4.  Note  that,  if  G\  and  G2  S'l'c  bounded-basis, 
then  Lemma  4.15  holds  even  if  G\  and  G2  have  the  same  extensional  nonterminals,  and  the 
number  of  basis  rules  can  therefore  be  reduced  to  three.  □ 

Now,  let  6'i  be  the  following  linear  grammar  over  S  =  {ai,fl2}- 

:  iVi  -+  aiN\ 

6.2  :  N\  —*  &2N1 

f/3  :  Ni  Cl 
^4  *  A'l  — ^  02 

Gi  generates  {oi  -t-  02}'^.  Let  G'>  be  a  linear,  bounded-basis  grammar  over  S,  with  nonter¬ 
minals  Ml,...,  Ml  and  start  symbol  Mi.  We  assume  without  loss  of  generality  that  Ni  is 
distinct  from  each  M;.  Construct  the  grammar  G'3  with  new  start  symbol  S  by  taking  the 
union  of  the  productions  in  Gi  and  G2,  and  adding  the  production 

$  :  S  NiMi. 

G3  generates  E+L{G2).  Apply  Algorithm  4.1  to  G'3  to  produce  the  program  V.  Note  that 
V  has  five  basis  rules  and  only  one  nonlinear  rule  (corresponding  to  the  production  s). 

Consider  the  legal  conjunctive  queries  in  V.  Since  5  appears  on  the  left-hand  side  of 
only  the  production  .s  and  does  not  appear  on  the  right-hand  side  of  any  production,  the 
nonlinear  rule  representing  the  production  s  is  used  only  in  the  simulation  of  strings  in 
yitldiS),  and  the  rule  can  only  be  used  at  the  root  of  a  top-down  expansion  involved  in 
such  a  simulation. 

Theorem  4.2  Let  P  be  a  safe,  head- rectified,  single-IDB  program  with  five  basis  rules  and 
one  nonlinear  rule.  The  base-case  linearizability  of  P  is  undecidable. 

Proof  Let  G'3  be  the  grammar  of  the  preceding  discussion,  and  let  P  be  the  result  of  applying 
Algorithm  4.1  to  G3.  .4ny  illegal  conjunctive  query  goiierated  by  P  is  contained  in  the  basis 
rule  b.  Every  legal  conjunctive  query  represent ing  a  string  in  yield(.Ni)  or  yield(Mi)  is 
linear,  and  hence  right-linear.  The  legal  conjunctiv<'  (luories  representing  strings  in  yieldfS  ) 
simulate  E'''L(G'2),  and  the  right-linear  subset  of  tlu'^'  <iu('ries  simulates  EL(G'2);  our  result 
follows  by  Lemma  4.5.  Safety  is  imposed  as  in  Sec  li'iu  1.1.  □ 
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4.6  Single-recursive-rule  programs 

Ill  this  section,  we  present  a  construction  whereby  an  arbitrary  Modified  Chomsky  Normal 
Form  (MCNF)  grammar  may  be  simulated  using  a  head-rectified,  single-IDB  program  with 
a  single  recursive  rule  and  a  bounded  number  of  basis  rules.  The  construction  may  be  used 
to  show  that  sequencability  is  undecidable,  even  for  programs  with  only  two  recursive  rules. 
In  addition,  the  construction  can  be  used  to  prove  the  undecidability  of  equivalence  (or 
containment)  of  programs  with  a  single  recursive  rule. 

4.6.1  The  construction 

Let  H  be  an  MCNF  grammar  over  E  =  {^1,^2},  with  nonterminals  start 

symbol  N3  and  extensional  productions  Ni  aj  and  N2  a2^  We  construct  a  program 
V,  with  one  nonrecursive  rule,  to  simulate  H. 

The  program  V  defines  the  IDB  predicate  p,  and  the  head  of  each  rule  is 

X,  y,  W,  Z,  G,  A,  iVi, . . . ,  N^) 

As  before,  <<  Ni  >>  denotes  an  7n- vector  in  which  the  ith  component  is  W,  and  in  wdiich 
all  other  components  are  Z.  The  expression 

p(<  K,L  ><  M  ><  R,T  >«  Ni  >>) 

will  be  used  to  represent  the  p-atom 

p(Z,A\X,M^Z,M,i?,r,<<  Ni  ») 

.  That  is,  the  first  argument  is  always  Z,  and  the  4th  and  5th  are  always  W  and  Z 
respectively  (so  that  W  and  Z  appear  in  their  "diome”  positions).  The  EDB  predicates  in 
V  are  ^  and  ft. 

The  variables  in  the  head  of  each  rule  have  the  following  purposes: 

1.  A  is  a  switch  that  is  relevant  only  to  the  proof  that  sequencability  is  undecidable. 
The  i2-position  in  the  arguments  of  each  j^-atom  in  the  body  of  the  recursive  rule  will 
be  occupied  by  the  variable  Z,  described  below. 

2.  A'  and  Y  are  the  end-points  of  binary  chains  representing  strings  in  L[H),  as  in  iln* 
preceding  section. 

3.  W  and  Z  are  guard  variables,  as  in  the  preceding  section.  They  are  used  to  w#'<m1 

^  out  illegal  conjunctive  queries  generated  by  the  programs  (i.e..  queries  represent iiiu 

impossible  derivations  in  the  grammar). 

4.  G  is  a  guard  position.  Intuitively,  a  p-atom  may  be  legally  expanded  through  ili*' 

^  recursive  rule  if  its  G-th  argument  is  IF.  but  not  if  its  G-th  argument  is  Z. 

5.  A  and  B  are  used  to  allow  a  choice  in  <‘X|)anding  one  of  two  or-productions,  in  .« 

manner  to  be  described. 


134 


CHAPTER  4.  UNDECIDABILITY  OF  THE  GENERAL  PROBLEMS 


6.  Ni,...,Nm  represent  the  corresponding  nonterminals  in  the  grammar. 

The  rules  of  V  are  constructed  as  follows. 

Algorithm  4.2 

INPUT:  an  MCNF  grammar  G  with  nonterminals  Aj, . . . ,  Nm,  where  Ni  is  the  extensional 
nonterminal  representing  and  N2  is  the  extensional  nonterminal  representing  02. 
OUTPUT:  a  head-rectified,  single-IDB  program  V  with  one  recursive  rule  r  and  9  basis 
rules  *1  . .  .ic). 

1.  The  head  of  each  rule  is  p(iJ,  X,  Y,  W,  Z.G.A,  B,  A'l, . . . ,  Nm)- 

2.  V  has  the  following  9  basis  rules: 

p{  R,  X.  Y,  W,Z,G,A,B,Ni,...,Nm)-.- 
a{X,Y)J(N,)J{G). 

6(A',y),/(A'2),/(G'). 

KZ). 
g{G). 
g{W). 
f{U),g(U). 
hiA,B),f(G). 
h{V\V),h{V,W). 


h 

'3 

u 

■i-5 

i- 


3.  The  body  of  the  recursive  rule  r  for  V  has  the  atoms  f{W),f{G),  g(Z),  g(N-[),  g{N2) 
and  h{E,F),  where  E  and  F  are  nondistinguished  variables  that  appear  nowhere  else 
in  the  program).  The  body  of  r  also  has  p-atoms  for  each  intensional  production  in 
the  grammar,  as  follows: 

(a)  Let  J  —>■  K  be  a  copy  production,  and  let  Vj  be  a  new  nondistinguished  variable. 
Then,  the  recursive  rule  contains  the  atom 

p(<  X,Y  ><  .7  ><  Uj,Uj  >«  K  ») 

This  atom  (and  any  version  of  it  in  a  top-down  expansion)  is  called  a  copy-ntom 
representing  K  in  the  production. 

(b)  Let  .7  — ‘  K  and  J  —  X  be  a  pair  of  or-productions,  and  let  Tj,Uj  and  Vj  be 
new  nondistinguished  variables.  Then,  tlu'  l)odv  of  r-p  contains  the  atoms 


k 


i 


p{<  X,Y  ><  .7  ><  T.i.Uj  >«  K  >>) 
p(<  AM'  ><  .7  ><  /./.!./  ><<  L  ») 
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The  first  of  these  is  the  or-atom  representing  K  in  the  first  production,  and 
the  second  is  the  or-atom  representing  L  in  the  second  production.  Versions  of 
these  two  atoms  in  a  to])-down  expansion  such  that  both  atoms  have  the  same 
parent  are  called  sibling  or-atoms.  The  idea  is  that  we  need  never  recursively 
expand  both  these  p-atoins,  since  either  one  (but  not  both)  may  be  expanded 
using  initialisation  rule  /’r. 

(c)  Let  J  — >  KL  be  an  and- production,  and  let  Tj,  Uj  and  Vj  be  new  nondistin- 
guished  variables,  rp  has  the  two  p-atoms 

?;(<  X\Tj  ><  J  ><  Uj,Uj  >«  K  ») 
p{<  Tj,Y  ><  .]  X  Vj,Vj  >«  L  ») 

The  first  of  these  atoms  is  called  the  and-atom  representing  K  in  the  production, 
and  the  second  is  called  the  and-atom  representing  L  in  the  production.  Versions 
of  these  atoms  that  have  the  same  parent  are  called  sibling  and-atoms.  The  idea, 
is  that  both  atoms  are  (recursively)  expanded,  to  create  chains  from  A'  to  Tj, 
and  from  Tj  to  1'. 


□ 


Example  4.8  The  following  MCNF  grammar  G  over  E  =  {c,d}  generates  11+. 

5'  _  TT  T  -  5  H  H  --C  H  -  D 

C  -c  D-^d 

The  corresponding  program  is 


\ 


r 


p{R,X,YJV,Z,G\A,BX\D, 


S,T,H):- 


piZ,  X,  V,  W,  Z,  5,  Ji,  h .  IL;  Z), 

p{z,  V,  r,  vv,  z,  5,  /2,  /2,  Z,  z,  z,  vv,  z), 
p{  Z,  A',  y,  VF,  Z,  T,  h,  u .  Z,  Z,  W,  Z.  Z) , 
p(  Z,  X,  y,  W,  Z,  T,  /4,  /s ,  z,  Z,  Z,  Z,  IF ) , 
p(Z,  A^  y,  W,  Z,  H.  /e.  It.  W,  Z,  Z,  Z.  Z), 
p(Z,  A',  y.  W,  Z,  //.  It,  Is,  Z.  IF.  Z,  Z.  Z). 
fiW),f{G),g(Z),g{C).g(D),h(E.F). 
piZ,  X,  Y,  W,  Z,  G,  A,  B,  S,  T.  C,  D) 


■ii  :  c{X,Y),f{C),f(G). 
h:  d{X,Y),fiD),fiG). 
h  :  fiZ). 

H  '■  g{G). 

H  :  9{W). 
k-.  f{U),giU). 

It:  h{A,B),f{G). 

is:  h{U.V),hiV,W). 
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'<9  :  h{U  1  V ). 

In  the  recursive  rule,  the  first  two  p-atoins  represent  the  and-production  5  ^  TT:  hence, 
these  atoms  are  and-atoms.  The  next  two  p-atoms  represent  the  or-productions  T  —  S 
and  T  H,  and  are  therefore  or-atoms.  Similarly,  the  5th  and  6th  p- atoms  are  or-atoms 
representing  the  or-productions  H  —  C  and  H  ^  D.  The  initialisation  rules  i\  and 
represent  the  extensional  productions  C  c  and  D  d  respectively.  □ 

Intuition 

The  intention  is  that  the  program  V  simulate  the  strings  in  the  grammar  6',  by  generating 
binary  chains  to  represent  the  strings  in  L{G).  The  basic  ideas  are  as  follows. 

Illegal  expansions  As  in  the  previous  section,  we  detect  illegal  top-down  expansions 
through  the  existence  of  the  atom  f(Z)  in  the  expansion.  The  idea  is  as  follows.  Con¬ 
sider  any  top-down  expansion  using  only  the  recursive  rule;  as  in  the  previous  section,  the 
variables  W  and  Z  are  persistent  in  the  p- atoms  of  the  expansion.  At  each  stage  of  the 
expansion,  p-atoms  in  the  body  of  the  recursive  rule  are  “activated”  by  placing  11  in  the 
guard  (6th)  position  of  the  atom,  or  “deactivated”  by  placing  Z  in  this  position.  If  the 
guard  position  of  a  p-atom  is  Z,  then  the  application  of  the  recursive  rule  will  introduce 
the  atom  /(Z),  and  the  expansion  is  illegal;  hence,  such  atoms  must  be  terminated  by  a 
basis  rule  in  all  legal  expansions.  The  activation  of  atoms  in  the  body  of  the  recursive  rule 
enforces  the  fact  that  legal  expansions  correspond  to  derivations  in  the  grammar. 

Example  4.9  Consider  the  grammar  and  program  of  Example  4.8.  .Assume  that  at  some 
stage,  the  atom  corresponding  to  the  production  T  —  H  has  been  activated;  that  is,  the 
atom  is  of  the  form 

p(Z,  .4,  5,  IT,  Z,  W, /4,  /s,  2,  Z,  Z,  Z,  IT) 

Applying  the  recursive  rule  to  this  atom  yields  atoms  of  the  form 

p(Z,  A,  V,  IT,  Z,  Z.  h,Iu  Z.  Z.  Z,  Z.  Z), 
p(Z,  T,  IT,  Z,  Z,  h,  h.  Z.  Z,  Z,  Z,  Z ), 
p(Z,  A,  B,  IT.  Z.  Z,  /3,  /4, 2.  Z,  Z.  Z,  Z), 
p(  Z,  A,  5,  IT,  Z,  Z,  /4.  h.  Z.  Z,  Z.  Z.  Z), 
p(Z,  A,  B,  IT,  Z,  W,  /e,  h.  IT.  Z,  Z,  Z), 
p{Z,  A,  5,  li;  Z.  IT,  /t,  /s.  Z.  it,  Z.  Z.  Z ), 
f[W)J[W\<j[Z).<j[C).<j[D).h(E,F). 

Note  that  the  first  four  atoms  are  deactivated  (with  Z  in  position  6).  and  the  last  two 
atoms  are  activated  (with  IT  in  position  6).  Note  also  that  the  atoms  that  are  activated 
are  preci.sely  the  atoms  representing  productions  whose  head  is  //,  as  reriuired  by  the  fact 
that  the  atoms  corresponding  to  the  production  T  —  H  has  been  expanded.  Finally,  note 
that  the  5th  atom  (the  first  to  be  activated)  represents  the  production  H  —  C.  and  the 
arguments  representing  the  nonterminals  in  this  atom  is  <<  C  >>.  Similarly,  note  tliat  tlie 
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6th  atom  (the  second  to  be  activated)  represents  the  production  H  D,  and  the  arguments 
representing  the  nonterminals  in  this  atom  is  <<  D  >>.  □ 

Or-productions  Another  wrinkle  is  the  fact  that  if  an  or-production  is  used  in  a  deriva¬ 
tion  in  the  grammar,  then  there  are  two  activated  atoms  in  the  simulating  expansion.  The 
idea,  here,  is  that  either  one  of  these  atoms  may  be  terminated  by  the  initialisation  i-,  but 
not  both  (otherwise  the  result  is  contained  in  the  initialisation  rule  ?'s)- 

Example  4.10  Consider  the  expansion  of  Example  4.9,  in  which  the  or-atoms  corresponding 
to  the  or-productions  H  ^  C  and  H  —  D  have  been  activated.  Either  one  of  these  atoms 
may  be  terminated  by  rule  ij.  However,  if  both  are  terminated  in  this  way,  then  the  result 
contains  the  atoms  h{Ie,l7),h{l7,Is)s,  and  the  result  is  contained  in  is.  □ 

The  simulation 

We  say  that  a  conjunctive  query  (or  top-down  expansion)  generated  by  the  program  is  illegal 
if  it  is  contained  in  one  of  the  initialisation  rules  ii  . . .  vg,  and  legal  otherwise.  No  interesting 
top-down  expansions  are  contained  in  ii  or  i2,  as  the  following  lemma  shows. 

Lemma  4.16  Let  T  be  a  top-down  expansion  in  which  the  recursive  rule  r  is  used.  Then 
T  (^i\  and  T  ^  i2. 

Proof.  The  root  of  T  is  expanded  using  ?•.  since  r  is  the  only  recursive  rule  in  the  program. 
T,  ii  and  *2  have  the  same  root  (the  rule  head  that  is  common  to  all  rules),  and  any 
containment  mapping  c  from  ii  or  12  into  T  must  satisfy  c{N\)  =  A'l  and  c(A''2)  =  A'a- 
Now,  /(A'l)  appears  in  the  body  of  /’i  and  /(A'2)  appears  in  the  body  of  However, 
neither  A^i  nor  N2  appears  in  the  body  of  r.  and  hence  neither  f{Ni)  nor  f{N2)  appears  in 
T.  Thus,  there  is  no  containment  mapping  c  from  /‘i  or  /'a  into  T.  □ 

The  following  lemmas  show  us  that  most  of  the  initialisation  rules  may  never  be  used 
in  any  interesting  top-down  expansion. 

Lemma  4.17  Let  T  be  a  top-down  expansion  in  which  one  of  the  basis  rules  i^.  i--,.  is 
and  ig  is  used.  Then  T  is  illegal. 

Corollary.  Only  the  rules  r,ii,  <2  and  17  are  used  in  any  legal  top-down  expansion. 

Proof.  The  head  of  T  is  the  rule  head  that  is  common  to  all  rules.  By  Lemma  4.10.  IT  and 
Z  appear  in  their  “home”  positions  in  every  p-atom  in  T.  Hence,  if  ij  is  used  in  T.  then  7 
is  contained  in  ij.  □ 

Lemma  4.18  Assume  that  T  is  a  legal  top-down  expansion  that  includes  a  p-atom  in  which 
the  6th  argument  (the  “home”  position  for  C)  is  H*.  Then  one  of  r,/i.  /2  aitd  h  must  lx* 
used  to  expand  such  a  p-atom. 

Proof.  By  Lemma  4.17,  one  of  f, ti,  t2-'i  '"'d  'r  must  be  u.sed  to  e.xpand  this  p-atom. 
However,  if  /,i  is  used,  then  g{H')  appears  in  ih--  luidy  of  T.  and  T  is  contained  in  i^.  □ 
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Lemma  4.19  Assume  that  T  is  a  legal  top-down  expansion  that  includes  a  p-atom  in  which 
the  6th  argument  (the  “home”  position  for  G)  is  Z.  Then  only  k  is  used  to  expand  ajiy 
such  atoms. 

Proof.  By  Lemma  4.17,  rules  i3,  i5,i6,?’8  and  k  may  not  be  used  in  the  expansion.  If  one  of 
r,  ii,  i2  and  ij  is  used,  then  f{Z)  appears  in  the  body  of  T.  and  T  C  h-  O 

Lemma  4.20  In  any  legal  top-down  expansion  T,  no  copy-atom  or  and-atom  is  expanded 
using  initialisation  rule  ij. 

Proof.  Assume  the  converse;  then  the  body  of  T  contains  the  atom  h{G,  G)  for  some  variable 
G',  and  T  therefore  is  contained  in  initialisation  rule  ig.  O 

Lemma  4.21  In  any  legal  top-down  expansion  T,  no  two  sibling  or-atoms  are  expanded 
using  rule  i-. 

Proof.  Otherwise,  T  is  contained  in  initialisation  rule  is-  O 

Lemma  4.22  Let  T  be  a  legal  top-down  expansion  in  which  s  and  t  are  two  sibling  or-atoms 
(or  sibling  and-atoms).  If  s  is  expanded  through  the  initialisation  rule  u,  then  so  is  t. 
Proof  Since  .s  and  t  are  sibbngs,  both  have  the  same  6th  argument  (the  argument  in  the 
home  position  for  G),  say  S.  If  s  is  expanded  through  (4,  then  </(5)  appears  in  the  body 
of  T.  Since  T  is  assumed  to  be  legal.  Lemma  4.17  requires  that  one  of  the  rules  r.ii,i2.  '-i 
and  h  is  used  to  expand  t;  however,  if  any  rule  but  G  is  used,  then  /(.S)  is  a  conjunct  in 
the  body  of  C  and  T  is  contained  in  rule  k  (and  is  therefore  illegal).  □ 


Lemma  4.23  In  any  top-down  expansion  T,  h  may  only  be  used  to  expand  one  of  two 
sibling  or-atoms,  and  the  sibling  or-atom  is  expanded  using  one  of  r,  q  and  h- 
Proof.  By  Lemmas  4.21  and  4.22.  □ 


Lemma  4.24  Let  T  be  a  legal  top-down  expansion  in  which  some  atom 

p(<  /i,/?  ><  h  X  U~h  ><<  Nf;  >>) 

(with  arbitrary  Ij)  is  expanded  through  the  recursive  rule  r.  Then  Nk  #  Ni  and  A*..  #  .N  j- 
Proof.  Assume  the  converse;  then,  the  body  of  the  recursive  rule  has  the  atom  g{W),  and 
T  is  contained  in  t’s.  □ 

Lemma  4.25  Let  T  be  a  legal  top-down  expansion  in  wliich  some  atom 

p{<  h,J2  ><  h  ><  ><<  AT  >>) 

(with  arbitrary  Ij)  is  expanded  through  the  initiali.'..iii(ui  rule  ij.  where  j  €  {1,2}-  Then 

Nk  =  Nj.  '  ... 

Proof.  .Assume  the  converse.  Then  f(Z)  appears  m  th--  IhkIv  of  1  .  and  T  C  13-  D 
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Lemma  4.26  Let  T  be  a  conjunctive  query  generated  by  V  in  which  the  recursive  rule  r 
is  used  at  least  twice.  Then  every  p-atom  at  depth  n  >  1  in  T  has  as  6th  argument  ( the 
argument  in  the  “home”  position  of  G)  either  W  or  Z. 

Proof.  By  Lemma  4.10,  W  and  Z  appear  in  their  home  positions  in  every  p-atom  in  T.  By 
construction,  every  variable  in  the  home  position  for  any  A',  in  any  7>atom  in  T  is  IT  or  Z, 
and  the  “guard”  position  is  therefore  occupied  by  W  or  Z  in  any  child  of  such  an  atom.  □ 

Let  us  say  that  a  conjunctive  query  is  restricted  if  whenever  one  of  tw'o  sibling  or-atoms 
is  expanded  through  or  *2,  then  its  sibling  atom  is  expanded  through  ij. 

Lemma  4.27  Every  legal  conjunctive  query  C  is  contained  in  a  conjunctive  query  D  that 
is  both  legal  and  restricted. 

Proof.  Let  us  assume  that  one  of  the  sibling  or-atoms 

p{<  U,V  X  S  ><  /i,/2  ><<  Ni  >>) 
and 

p(<  U,V  ><  S  ><  hJz  >«  Nj  ») 

is  expanded  through  r,ii  or  12  in  the  top-down  expansion  T  establishing  the  conjunctive 
query  C.  Since  C  is  legal,  the  root  of  T  is  expanded  using  rule  r,  and  C  therefore  contains 
an  atom  of  the  form  h{E,F).  By  Lemma  4.26  and  Lemma  4.19  ,  S  is  IT  or  G.  and  by 
construction  f{S)  appears  in  the  body  of  C.  By  Lemmas  4.18  and  4.22  ,  the  sibling  atom 
is  expanded  through  one  of  r,Ji,i2  ‘‘■nd  ij.  If  ij  is  not  used,  then  since  the  distinguished 
variables  .4  and  B  appear  nowhere  in  the  bodies  of  r,  I'l  or  12,  the  variables  Ii,l2  aird 
appear  nowhere  in  the  fringe  of  T  (i.e.,  in  the  body  of  C).  Thus,  we  may  construct  a  new- 
conjunctive  query  C'  from  C,  by  expanding  the  first  or-atom  through  rule  ij  to  produce 
the  atoms  /(5)  and  h{Ii,l2)  for  variables  Ii  and  I2  that  appear  nowhere  else  in  the  new- 
conjunctive  query  C .  C  is  contained  in  C  because  f{S)  and  an  atom  of  the  form  h\E.  F) 
appears  in  C.  An  inductive  repetition  suffices  to  remove  all  violations  of  restrictiveness  in 
C,  while  preserving  the  legality  of  C.  □ 

For  any  distinct  variables  U  and  T,  define  a  {U,  T}-atom  corresponding  to  an  intensional 
nonterminal  A,  to  be  any  p-atom 

p(<  l\V  ><  W  ><  H,I  ><<  Ni  >>) 

for  any  variables  H  and  I,  and  let  chains  be  defined  as  in  the  preceding  section.  Note  that, 
depending  on  the  variables  H  and  /  in  any  p-atom,  a  string  of  terminals  and  nonterrniiiab 
has  many  corresponding  chains. 

Definition  4.6  Define  a  top-down  expansion  T  to  be  c/ascr/if  T  is  legal  and  restricted,  and 
if  the  following  hold. 

1.  .A.ny  p-atom  in  which  the  6th  argument  (the  argument  representing  the  guard  (1) 
the  distinguished  variable  Z  is  expanded  through  that  is,  no  such  p-atom  is  a  h'al. 
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2.  Consider  any  two  sibling  or-atoins  neither  of  which  is  expanded  using  h-  Then  one 
of  these  atoms  is  expanded  using  rule  it- 

3.  No  p-atoms  at  depth  1  in  the  tree  are  leaves;  that  is,  every  p-child  of  the  root  is 
expanded  through  some  rule. 

By  Lemmas  4.19,  4.18,  4.22  and  4.26,  every  top-down  expansion  establishing  a  legal, 
restricted  conjunctive  query  is  closed.  □ 

The  following  Lemmas  formalise  the  simulation  of  the  grammar  G  by  the  program  V. 
Lemma  4.28  Let 

p(<  U,V><  S  ><  /i,/2  ><<  Ni  >>) 

be  an  atom  in  a  closed  top-down  expansion  R,  where  S  is  one  of  W  and  G,  where  U  V, 
and  where  h  and  h  are  arbitrary.  If  this  p-atom  is  expanded  using  one  application  of  one 
of  i\  i\  and  to  produce  a  closed  top-down  expansion  T,  then  the  subtree  rooted  at  this 
/;-atom  contains  the  following. 

1.  The  atom  /(5),  and  0  or  more  occurrences  of  the  atoms  f{W)  and  g{Z). 

2.  0  or  more  occurrences  of  atoms  h{P^Q)  for  distinct  variables  P  and  Q, 

3.  A  chain  from  U  to  V  representing  7,  where  Aj  —  7  is  a  production. 

Proof.  By  Lemma  4.25,  the  rule  ik  (/:  €  {1/2} )  may  be  used  only  if  iY,  =  the  subtree 
then  contains  the  atoms  /(5),  f{W)  and  ak{U,  V).  RecaU  that  the  extensional  productions 
are  of  the  form  Ni  ^  ai  and  A- 2  ^  a2- 

Assume  that  r  is  used.  By  Lemma  4.24,  Ni  ^  Ni  and  Ni  ^  N2]  that  is,  Ni  is  an 
intensional  nonterminal.  Application  of  r  yields  the  atoms  f{S)  and  /(VF),  two  copies  of 
g{Z)  (since  Ni  7^  A^i  and  Ni  7^  A2),  and  an  atom  of  the  sort  h[E  )  wheie  E  and  F  aie 
distinct,  new  variables  obtained  by  renaming  E  and  F  in  the  top-down  expansion.  Consider 
the  p-atoms  that  are  generated  by  this  application  of  r.  By  construction,  the  6th  position  of 
any  atom  representing  a  production  Nj  —  /3  is  Z,  if  A^  ^  At,  hence,  by  Lemma  4.19,  e\ei\ 
such  atom  must  be  expanded  using  rule  (yielding  the  atom  g(Z)).  Also  by  construction, 
the  6th  argument  of  the  atom(s)  representing  any  production  Ni  7  is  VF.  There  are  three 
cases. 


[  is  a  copy-production.  By  construction,  the  corresponding  atom  in  the 

indicated  application  of  r  is 

p{<  U,V  ><  W  ><  Ji.Ji  ><<  Nk  >>) 


and  our  result  follows. 
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2.  Ni  —*■  Nk  and  Ni  Ni  are  or-productions.  By  construction,  the  corresponding  atoms 
in  the  indicated  application  of  r  are 

p{<  U,V  ><  W  ><  ./i,  J2  ><<  Nk  ») 

p(<  u,v><  vr  ><  ./2,J3  ><<  Ni  >>) 

for  some  nondistinguished  variables  Jj,  J2  and  J3  that  appear  nowhere  else  in  T  (since 
these  variables  are  renamed  in  the  expansion).  By  Lemma  4.18,  and  since  T  is  assumed 
closed,  one  of  these  atoms  must  be  expanded  through  iy  to  add  the  atoms  f[W)  and 
/i(/i,/2)  (or  h(l2,h))  to  T.  Our  result  follows. 

3.  Ni  — i-  NkNi  is  an  and-production,  and  the  corresponding  atoms  in  the  indicated 
application  of  r  are 

pi<  U,I  ><  W  ><  ><<  Nk  ») 

pI<  I,V  ><  W  ><  J2,h  >«  Ni  >>) 

where  /  is  a  renaming  of  the  variable  TAt,  to  a  new  variable  that  appears  nowhere  else 
in  the  expansion.  Our  result  follows. 

□ 


Lemma  4.29  Let 

p{<  U.V  ><  S  ><  I1J2  >«  Ni  ») 

be  an  atom  in  a  closed  top-down  expansion  jR,  where  5  is  one  of  W  and  6',  where  U  ^  '  • 
and  where  I\  and  I2  are  arbitrary.  If  Ni  -*•  7  is  a  production  in  the  grammar  G’,  then 
there  is  a  closed  top-down  expansion  T  obtained  by  expanding  this  p-atom  using  exactly 
one  application  of  r,  z'l  or  12,  but  with  an  arbitrary  number  of  uses  of  other  rules,  such  that 
the  subtree  rooted  at  this  p-atom  contains  the  following  atoms. 

1.  The  atom  f{S).  and  0  or  more  occurrences  of  the  atoms  f(W)  and  g{Z). 

2.  0  or  more  occurrences  of  atoms  h[P,Q)  for  distinct  variables  P  and  Q. 

3.  A  chain  from  U  to  V  representing  7. 

Proof.  If  i  =  1  or  i  =  2,  then  we  may  apply  j'l  or  12  to  the  indicated  p-atom  to  obtain  the 
result. 

Assume  that  Ni  is  an  intensional  nonterminal.  Expand  the  indicated  p-atom  through 
the  recursive  rule  r.  Then  the  “non-p”  children  of  the  indicated  p-atom  are  /(5).  /(H  ). 
g(Z).  and  h(E',  F')  where  E'  and  F'  are  distinct,  new  variables  obtained  by  renaming  F  ami 
F  in  the  top-down  expansion.  Consider  the  p-a.toms  that  are  generated  by  this  application 
of  r.  By  construction,  the  6th  position  of  any  atom  representing  a  production  .\'j  —  3  i.s 
Z.  if  Nj  ^  A',;  hence,  every  such  atom  may  be  expanded  using  rule  t4  (yielding  the  atom 
g{Z),  which  appears  35  a  child  of  the  root.)  Also  by  construction,  the  6th  argument  of  the 
atom(s)  representing  any  production  .A,  —  '  is  11'.  There  are  three  cases. 
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1.  Ni  Nk  is  a  copy- prod  action.  By  construction,  the  corresponding  atom  in  the 
indicated  application  of  r  is 

p{<  U,V  ><  W  ><  JuJi  >«  Nk  >>) 

and  our  result  follows. 

2.  Ni  — i-  Nk  and  Ni  Ni  are  or-productions.  By  construction,  the  corresponding  atoms 
in  the  indicated  application  of  r  are 

p{<  U,V><  W  ><  ./i,  J2  >«  Nk  ») 
p{<  U,V  ><  W  ><  J2,Jz  >«  Ni  ») 

for  some  nondistinguished  variables  Ji,  J2  and  J3  that  appear  nowhere  else  in  T  (since 
these  variables  are  renamed  in  the  expansion).  Without  loss  of  generality,  assume  that 
7  is  Nk-  Then,  the  latter  atom  may  be  expanded  through  ij  to  add  the  atoms  /(IT) 
and  h{l2,h)  to  T.  Our  result  follows. 

3.  Ni  NkNt  is  an  and-production.  Then,  the  corresponding  atoms  in  the  indicated 
application  of  r  are 

p{<  U,I  ><  W  ><  Ji,  Ji  ><<  Nk  ») 
p{<  I,V  X  W  X  ./2,  J2  ><<  >>) 

wliere  /  is  a  renaming  of  the  variable  Tx?,  to  a  new  variable  that  appears  nowhere  else 
in  the  expansion.  Our  result  follows. 

□ 

For  any  nonterminal  -J  in  G,  recall  that  yield(.J)  represent  all  the  strings  generated  by 
./  (i.e.,  the  strings  that  would  be  generated  if  J  were  the  start  symbol  of  G). 

Lemma  4.30  Let  T  be  a  closed  top-down  expansion  generated  by  V.  Then  the  body  of  T 
consists  of  the  atoms  f{W),f{G),g{Z),g{Ni)  and  J7(A'2);  atoms  of  the  form  /i(f/,T).  where 
U  and  V  do  not  appear  in  the  root;  and  for  3  <  t  <  in,  either  g{Ni),  or  f(N,)  and  a  chain 
representing  some  string  S  of  terminals  and  nonterminals  such  that  A',  =>  6. 

Corollary.  Let  C  be  a  conjunctive  query  generated  by  'P.  established  by  a  closed  top-down 
e.xpansion  T.  Then  the  body  of  C  consists  of  the  atoms  f(W),  f(G).g(Z),g{N\)  and  g{.Y-2): 
atoms  of  the  form  h{U,V),  where  U  and  V  do  not  appear  in  the  root;  and  for  3  <  i  <  m. 
either  g{Ni).  or  f{Ni)  and  a  chain  representing  some  terminal  string  6  such  that  Ay  =>  6. 
Proof.  Let  T  be  a  closed  top-down  expansion  in  G.  .Since  T  is  assumed  closed,  the  root  is 
expanded  through  the  recursive  rule  r.  For  any  intensional  nonterminal  A’,-,  consider  all  the 
atoms  representing  productions  of  the  form  Ay  —  o.  I'lie  proof  proceeds  by  induction  on 
n,  the  number  of  applications  of  r.  i\  or  <2  to  atoms  ii'presenting  productions  of  which  Ni 
is  the  head. 

If  n  =  1,  then  by  Lemma  4.22,  rule  24  is  used  to  ''xpaiid  all  such  atoms,  and  the  atom 
g(G)  is  introduced. 
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If  n  =  2,  then  rule  r  is  used  and  the  result  follows  by  Lemma  4.28;  note  that  the  6th 
argument  (the  home’  position  for  G)  in  any  p-atom  at  depth  1  is  G  by  construction.  The 
induction  also  follows  by  Lemma  4.28. 

To  prove  the  corollary,  we  observe  that  in  any  conjunctive  query  (with  EDB’s  at  the 
leaves),  if  h{U,  V')  is  obtained  by  the  e.xpansion  of  an  or-atom  through  ij,  then  the  occurrence 
of  U  (or  V)  in  the  sibling  or-atom  disappears  when  r,  or  12  is  used  to  expand  the  latter, 
since  the  distinguished  variables  A  and  B  do  not  appear  in  the  body  of  these  rules.  □ 

The  converse  is  also  true. 

Lemma  4.31  Assume  that  Ni  =?»  where  i  >  2.  Then,  V  generates  a  closed  top-down 
expansion  T,  as  follows.  The  body  of  T  consists  of  the  atoms  f{W),f{G),g{Z),  g{Ni)  and 
p(iV2);  atoms  of  the  form  h{U,V),  where  U  and  V'  do  not  appear  in  the  root;  and  either 
g{Ni),  or  /(Ni)  and  a  chain  representing  6. 

Corollary.  Assume  that  A,  =>  6,  where  6  is  a  string  of  terminals  and  i  >  2.  Then,  V 
generates  a  closed  top-down  e.xpansion  T  generating  a  conjunctive  query  C,  as  follows.  The 
body  of  C  consists  of  the  atoms  f{W)J(G),g{Z),  g{Ni)  and  5(^2);  atoms  of  the  form 
h(^U,V  ),  where  U  and  V  appear  nowhere  else  in  C;  and  either  p(A,),  or  /(A,)  and  a  binary 
chain  representing  S. 

Proof.  Assume  that  Ni  =;>  6.  Our  proof  proceeds  by  induction  on  n,  the  depth  of  the 
sentential  forms  derived  by  A,-  in  G.  The  basis  (//  =  1)  follows  by  Lemma  4.29  (since  the 
6th  position  in  any  child  of  the  root  is  6',- if  r  is  u.sed  to  expand  the  root),  and  the  induction 
also  follows  by  Lemma  4.29. 

The  corollary  is  proved  as  in  the  previous  lemma.  □ 

4.6.2  Using  the  construction 

Let  Gi  be  an  arbitrary  MCNF  grammar  over  the  alphabet  S  =  {0,1,02},  with  extensional 
nonterminals  A'’i  and  A2  representing  «!  and  «2  respectively,  start  symbol  A'l  and  nonter- 
mihals  Aj,. ..,  An.  Let  G'2  be  an  arbitrary  MCNF  grammar  over  the  same  alphabet,  witli 
the  same  extensional  nonterminals  representing  the  same  terminals,  and  with  start  symlx)! 
L\  and  nonterminals  L\,. .  .,L,n.  Create  a.  new  start  symbol  5';  construct  the  new  graininar 
G3  with  start  symbol  S,  with  productions  that  are  the  union  of  the  productions  in  G'l  and 
G2,  and  with  the  new  productions 

51  :  5”  — »■  A'l 

52  :  5  ^  Ai 

Let  G.1  be  the  grammar  obtained  from  G3  by  deleting  the  production  si,  but  with  the  saiif 
start  symbol.  Clearly,  G3  and  G4  have  the  same  nonterminals,  and  yield[M)  is  the  sam-- 
in  both  grammars,  for  M  S.  Also.  L{G-,)  =  /.(G,)u  LiGi)  and  L(G^)  =  A(G2);  henc. 
L{G4)  C  A(G3),  and  A(G3)  C  AIG^)  iff  L(G,  1  :  /.(G,). 

Apply  Algorithm  4.2  to  G3  and  G.i  to  iirndnco  p  and  Q  respectively,  making  sure  thai 
the  order  of  nonterminals  is  preserved  in  lie-  ml.'  heads  n\  P  and  Q. 
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Lemma  4.32  P  C  Q  (V  =  Q)  iff  L{Gi}  C  X(G'2). 

Proof.  By  Lemmas  4.30,  4.31  and  4.9  ,  and  by  construction.  □ 

Theorem  4.3  Let  P  and  Q  be  safe,  single-IDB  programs  with  a  single  recursive  rule 
and  nine  initialisation  rules.  Then,  the  containment  or  equivalence  of  such  programs  is 
undecidable. 

Proof.  By  Lemmas  4.32  and  4.4.  □ 

Theorem  4.4  The  sequencability  of  single-IDB  programs  is  undecidable. 

Proof  Let  V  and  Q  be  as  above.  Assume  that  V  has  the  recursive  rule  ri  and  the  nine 
initialization  rules  h  . .  .ig,  and  that  Q  consists  of  the  recursive  rule  r2  and  the  s^e  nine 
initialisation  rules.  Add  the  atom  f(R)  to  the  body  of  rj  to  obtain  the  rule  and  let  T  be 
the  program  consisting  of  r'l,  r2,  ii,  •  •  • ,  k-  Now,  since  the  “home”  position  for  R  is  occupied 
by  the  persistent  variable  Z  in  every  p-atom  in  either  recuisive  rule,  we  may  cone  u  ® 
the  recursive  rule  r[  is  used  at  most  once  in  any  top-down  expansion  that  is  not  contained 
in  the  initialisation  rule  k,  and  that  r[  is  used  to  expand  the  root  in  such  a  case  .  Thus 
the  yield  of  S  in  the  program  represents  L(Gi)UX(G2),  hut  the  yield  of  S  in  the  sequence 
program  represents  LiG-z),  and  our  result  follows  as  above.  D 


^The  body  of  the  resulting  expansion  contains  the  atom  /(/.’)■  Hut  this  atom  is  irrelevant  to  the  secpien 
cabilit  y  of  T. 


Chapter  5 

Concluding  remarks 


In  this  report,  we  investigate  opportunities  to  optimize  recursive  Horn-clause  programs 
through  transformations  to  simpler,  more  efficiently  evaluable  programs.  We  focus  our 
attention  on  optimizations  that  may  be  described  in  terms  of  normal  forms  for  the  proof 
trees  generated  by  the  program  in  question.  We  introduce  the  idea  of  subtree  eliminations 
as  a  way  to  describe  normal  forms,  and  present  a  uniform  approach  to  the  development  of 
sufficient  conditions  for  the  detection  of  the  applicability  of  a  normal  form  to  the  program. 
We  then  illustrate  this  approach  on  the  detection  of  one-boundedness,  basis-linearizability 
and  sequencability,  and  show  how  the  sufficient  conditions  that  are  generated  may  be  tuned 
to  the  desired  complexity.  We  also  investigate  the  complexity  of  these  three  optimization 
problems;  our  investigation  yields  a  characterization  of  the  complexity  of  conjunctive  query 
containment,  and  tight  undecidability  results  for  the  detection  of  program  equivalence.  Our 
results  are  contained  in  Table  5.1.  The  programs  considered  are  all  Datalog.  The  expression 
<  i  7'eps  means  that  each  recursive  rule  in  the  program  has  at  most  i  occurrences  of  any 
EDB  predicate  in  the  body  of  each  recursive  rule. 

Whether  sequencability  is  decidable  in  any  interesting  case  is  an  open  question.  However, 


Polynomial  time 

A/'T’- hard 

U  ndecidable 

-  ...  ... ... 

One- 

boundedness 

linear  sirup, 

<  1  reps 

linear  sirup, 

<  4  reps 

never 

Basis- 

linearizability 

bilinear  sirup, 

<  1  reps 

bilinear  sirup, 

<  4  reps 

1  nonlinear  rule, 

5  basis  rules. 

Sequencability 

??? 

2  recursive  rules 
(both  linear), 

<  3  reps, 

1  basis  rule 

2  recursive  rules, 

9  basis  rules. 

Table  5.1:  Comi)lexity  results. 
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current  work  that  the  author  is  undertaking  in  conjunction  with  Tomas  Feder  seems  to 
indicate  a  positive  answer  to  that  question. 
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